🔒 Network Intrusion Detection

🛡 Prepared By:

👤 Abdullah Ayman -0204302-

👤 Hamad Magairah -0204486-

🎨 Designed using HTML & CSS

📚 Table of Contents

  • 🚀 Data preprocessing
    • Introduction to Network Intrusion Detection
    • Explore Data
    • Data Cleaning
    • Feature Engineering
    • Visualization
    • Correlation
    • Outliers
    • Duplicates
    • Data Splitting
    • Custom Transformer
    • Data Preprocessing
  • 🤖 Machine Learning Modeling
    • Introduction
    • Pre-Defined Useful Function for Visualization
    • Dummy Classifier
    • LogisticRegression Classifier
    • KNeighborsClassifier
    • Support Vector Classifier
    • Decision Tree Classifier
    • Random Forest Classifier
    • Ensamble (Voting ,Stacking, Bagging Classifiers and Try some new classifiers )
    • Comparison Between All classifiers and select the best one.
    • Fine Tune Using Randomized GridSearchCV
    • Final Model
  • 🤖 Neural Network
    • Introduction
    • perceptron
    • Multilayer Perceptron (MLP)
    • Sequential API
    • Functional API
    • Functional API VS Sequential API
    • Keras Tuner
    • Functional API VS Functional_best_model
    • Comparesion between ML & NN

🚀 Data preprocessing

Introduction to Network Intrusion Detection

The dataset provided for auditing represents a simulated military network environment, specifically designed to capture a wide range of intrusions. The data collection involved creating a setup that mimics a typical US Air Force Local Area Network (LAN). The LAN was structured to resemble a real-world environment and subjected to various simulated attacks. Raw TCP/IP dump data was generated during this process.

Data Source:

The data is sourced from a simulated US Air Force LAN environment.

Intrusions are simulated to capture a variety of attack scenarios.

Connection Definition:

A connection is defined as a sequence of TCP packets with a start and end time.

Data flows between a source IP address and a target IP address within a specified protocol.

Features:

Each connection record contains about 100 bytes of information.

A total of 41 features are extracted for each TCP/IP connection, including both quantitative (38 features) and qualitative (3 features) aspects.

Attack Labels:

Each connection is labeled as either "normal" or categorized with a specific attack type.

The class variable has two categories: "Normal" and "Anomalous."

Feature Categories:

Quantitative features provide numerical information about the connections.

Qualitative features offer categorical information, Features cover various aspects related to network connections, likely including details about IP addresses, protocols, and other relevant network attributes.

Class Variable:

The class variable is binary, indicating whether a connection is "Normal" or "Anomalous."

Data Size:

The dataset size is 25,193 samples, 42 features

In [1]:
from IPython.display import Image
Image(filename='what-does-IDS-do-1024x536.png')
Out[1]:

🌐 The Chronicles Unveiled 🕵️‍♂️

Embark on a digital odyssey through the Network Intrusion Detection dataset — a mosaic of 25 metrics chronicling 25,000 records. Unearth the secrets of cyber conflicts, a dance between aggressors and guardians.

📊 Dataset Metrics

  • 🌐 Source: Extracted from the Kaggle's Network Intrusion Detection.
  • 📊 Shape: Encompassing 25,000 data entries, the records are spread across 5 pivotal attributes, crafting a panoramic view of cyber-attack dimensions.
  • 🕰️ duration: Duration of the connection in seconds.
  • 🌐 protocol_type: Protocol used in the connection (e.g., tcp, udp).
  • 🎯 service: Network service on the destination (e.g., http, ftp).
  • 🌐 flag: Status of the connection (e.g., SF - normal, S0 - connection attempt, REJ - connection rejected).
  • 🌐 src_bytes: Number of data bytes from source to destination.
  • 📦 dst_bytes: Number of data bytes from destination to source.
  • 🔍 land: Whether the connection is from/to the same host/port (binary: 1 if connection is from/to the same host/port, 0 otherwise).
  • 🚥 wrong_fragment: Number of wrong fragments.
  • 📤 urgent: Number of urgent packets.
  • 🐛 hot: Number of "hot" indicators.
  • 🚨 num_failed_logins: Number of failed login attempts.
  • 🔍 logged_in: Binary attribute indicating if the user is logged in (1) or not (0).
  • 🐛 num_compromised: Number of compromised conditions.
  • 🚨 root_shell: Binary attribute indicating if root shell is obtained (1) or not (0).
  • 🚨 su_attempted: Binary attribute indicating if 'su root' command attempted (1) or not (0).
  • 🐛 num_root: Number of root accesses.
  • 🚨 num_file_creations: Number of file creation operations.
  • 🐚 num_shells: Number of shell prompts.
  • 📂 num_access_files: Number of operations on access control files.
  • 🔗 num_outbound_cmds: Number of outbound commands in an ftp session.
  • 🌐 is_host_login: Binary attribute indicating if the login is a host login (1) or not (0).
  • 🌐 is_guest_login: Binary attribute indicating if the login is a guest login (1) or not (0).
  • 📊 count: The number of connections to the same host as the current connection in the past two seconds.
  • 📊 srv_count: The number of connections to the same service as the current connection in the past two seconds.
  • 📊 serror_rate: The percentage of connections that have "SYN" errors.
  • 📊 srv_serror_rate: The percentage of connections to the same service with "SYN" errors.
  • 📊 rerror_rate: The percentage of connections that have "REJ" errors.
  • 📊 srv_rerror_rate: The percentage of connections to the same service with "REJ" errors.
  • 📊 same_srv_rate: The percentage of connections to the same service.
  • 📊 diff_srv_rate: The percentage of connections to different services.
  • 📊 srv_diff_host_rate: The percentage of connections to different hosts for the same service.
  • 📊 dst_host_count: The number of connections to the same destination host.
  • 📊 dst_host_srv_count: The number of connections to the same service on the destination host.
  • 📊 dst_host_same_srv_rate: The percentage of connections to the same service on the destination host.
  • 📊 dst_host_diff_srv_rate: The percentage of connections to different services on the destination host.
  • 📊 dst_host_same_src_port_rate: The percentage of connections to the same source port.
  • 📊 dst_host_srv_diff_host_rate: The percentage of connections to different hosts for the same service on the destination host.
  • 📊 dst_host_serror_rate: The percentage of connections that have "SYN" errors to the destination host.
  • 📊 dst_host_srv_serror_rate: The percentage of connections to the same service with "SYN" errors to the destination host.
  • 📊 dst_host_rerror_rate: The percentage of connections that have "REJ" errors to the destination host.
  • 📊 dst_host_srv_rerror_rate: The percentage of connections to the same service with "REJ" errors to the destination host.

</div>

📘 Import Important Libraries

In [2]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import re
import plotly.express as px
import seaborn as sns
import sklearn
import warnings
import plotly.express as px
import xgboost 
import tensorflow as tf
import keras_tuner
from tensorflow import keras
import tensorflow as tf
from tensorflow.keras import layers
from kerastuner.tuners import RandomSearch
from kerastuner import HyperParameters
import tensorflow as tf
from tensorflow.keras.layers import Dense, LeakyReLU, concatenate
from kerastuner.tuners import RandomSearch
from kerastuner.engine.hyperparameters import HyperParameters




from plotly.express import scatter
from sklearn.metrics import confusion_matrix, ConfusionMatrixDisplay, classification_report
from sklearn.metrics import precision_score, recall_score, f1_score, classification_report ,accuracy_score
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import cross_val_predict
from sklearn.neighbors import KNeighborsClassifier
from sklearn.svm import SVC, LinearSVC
from sklearn.linear_model import SGDClassifier
from sklearn.ensemble import RandomForestClassifier
from sklearn.naive_bayes import GaussianNB
from sklearn.ensemble import RandomForestClassifier, VotingClassifier
from sklearn.ensemble import BaggingClassifier
from sklearn.tree import DecisionTreeClassifier
from sklearn.ensemble import RandomForestClassifier, VotingClassifier
from sklearn.gaussian_process import GaussianProcessClassifier
from sklearn.gaussian_process.kernels import RBF
from sklearn.ensemble import HistGradientBoostingClassifier
from sklearn.tree import DecisionTreeClassifier
from sklearn.ensemble import AdaBoostClassifier
from xgboost import XGBClassifier
from sklearn.ensemble import StackingClassifier
from sklearn.model_selection import cross_val_score
from sklearn.inspection import DecisionBoundaryDisplay
from matplotlib.colors import ListedColormap
from sklearn.pipeline import make_pipeline
from sklearn.preprocessing import label_binarize
from sklearn.inspection import DecisionBoundaryDisplay
from sklearn.dummy import DummyClassifier
from sklearn.model_selection import GridSearchCV
from sklearn.metrics import precision_recall_curve
from sklearn.metrics import average_precision_score
from sklearn.multiclass import OneVsRestClassifier
from sklearn.preprocessing import label_binarize
from sklearn.model_selection import cross_val_predict
from sklearn.metrics import roc_curve, auc
from sklearn.preprocessing import LabelEncoder
from sklearn.model_selection import StratifiedShuffleSplit
from sklearn.base import BaseEstimator, TransformerMixin
from scipy.sparse import hstack
from sklearn.compose import ColumnTransformer
from sklearn.pipeline import Pipeline
from sklearn.impute import SimpleImputer
from sklearn.preprocessing import StandardScaler, OrdinalEncoder
from sklearn import tree
from sklearn.metrics import DetCurveDisplay, RocCurveDisplay
from sklearn.model_selection import RandomizedSearchCV
from scipy.stats import randint
from sklearn.linear_model import Perceptron
from sklearn.decomposition import PCA
from sklearn.linear_model import SGDClassifier
from sklearn.model_selection import GridSearchCV
from sklearn.pipeline import make_pipeline
from tensorflow.keras.models import Sequential
from tensorflow.keras import layers
from tensorflow import keras
import ascii_graph
from sklearn.model_selection import train_test_split
import plotly.figure_factory as ff
from sklearn.metrics import confusion_matrix
import numpy as np
from tensorflow.keras.models import Sequential, Model
from tensorflow.keras.layers import Dense, Dropout, Input, concatenate
from tensorflow.keras.optimizers import Adam
warnings.filterwarnings('ignore')
WARNING:tensorflow:From C:\Users\abeda\anaconda3\Lib\site-packages\keras\src\losses.py:2976: The name tf.losses.sparse_softmax_cross_entropy is deprecated. Please use tf.compat.v1.losses.sparse_softmax_cross_entropy instead.

C:\Users\abeda\AppData\Local\Temp\ipykernel_18484\3794676868.py:16: DeprecationWarning: `import kerastuner` is deprecated, please use `import keras_tuner`.
  from kerastuner.tuners import RandomSearch

🔍 Explore the Dataset

Dive into the details of the Cybersecurity dataset in this section.

In [3]:
data=pd.read_csv("NetworkIntrusionDetection.csv")
data.head(5)
Out[3]:
duration protocol_type service flag src_bytes dst_bytes land wrong_fragment urgent hot ... dst_host_srv_count dst_host_same_srv_rate dst_host_diff_srv_rate dst_host_same_src_port_rate dst_host_srv_diff_host_rate dst_host_serror_rate dst_host_srv_serror_rate dst_host_rerror_rate dst_host_srv_rerror_rate class
0 0 tcp ftp_data SF 491 0 0 0 0 0 ... 25 0.17 0.03 0.17 0.00 0.00 0.00 0.05 0.00 normal
1 0 udp other SF 146 0 0 0 0 0 ... 1 0.00 0.60 0.88 0.00 0.00 0.00 0.00 0.00 normal
2 0 tcp private S0 0 0 0 0 0 0 ... 26 0.10 0.05 0.00 0.00 1.00 1.00 0.00 0.00 anomaly
3 0 tcp http SF 232 8153 0 0 0 0 ... 255 1.00 0.00 0.03 0.04 0.03 0.01 0.00 0.01 normal
4 0 tcp http SF 199 420 0 0 0 0 ... 255 1.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 normal

5 rows × 42 columns

In [4]:
sns.set(style="whitegrid")  # Set the style of the plot

plt.figure(figsize=(12, 8))
sns.histplot(data['class'], bins=50, kde=True, color="skyblue", edgecolor="black")

# Add labels and title
plt.xlabel("Detection")
plt.ylabel("Distribution")
plt.title("Distribution of Data")

# Add grid for better readability
plt.grid(axis='y', linestyle='--', alpha=0.7)

# Show the plot
plt.show()

Great! Our label in this data is balanced between "anomaly" and "normal".

In [5]:
data.info()
data.shape
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 25192 entries, 0 to 25191
Data columns (total 42 columns):
 #   Column                       Non-Null Count  Dtype  
---  ------                       --------------  -----  
 0   duration                     25192 non-null  int64  
 1   protocol_type                25192 non-null  object 
 2   service                      25192 non-null  object 
 3   flag                         25192 non-null  object 
 4   src_bytes                    25192 non-null  int64  
 5   dst_bytes                    25192 non-null  int64  
 6   land                         25192 non-null  int64  
 7   wrong_fragment               25192 non-null  int64  
 8   urgent                       25192 non-null  int64  
 9   hot                          25192 non-null  int64  
 10  num_failed_logins            25192 non-null  int64  
 11  logged_in                    25192 non-null  int64  
 12  num_compromised              25192 non-null  int64  
 13  root_shell                   25192 non-null  int64  
 14  su_attempted                 25192 non-null  int64  
 15  num_root                     25192 non-null  int64  
 16  num_file_creations           25192 non-null  int64  
 17  num_shells                   25192 non-null  int64  
 18  num_access_files             25192 non-null  int64  
 19  num_outbound_cmds            25192 non-null  int64  
 20  is_host_login                25192 non-null  int64  
 21  is_guest_login               25192 non-null  int64  
 22  count                        25192 non-null  int64  
 23  srv_count                    25192 non-null  int64  
 24  serror_rate                  25192 non-null  float64
 25  srv_serror_rate              25192 non-null  float64
 26  rerror_rate                  25192 non-null  float64
 27  srv_rerror_rate              25192 non-null  float64
 28  same_srv_rate                25192 non-null  float64
 29  diff_srv_rate                25192 non-null  float64
 30  srv_diff_host_rate           25192 non-null  float64
 31  dst_host_count               25192 non-null  int64  
 32  dst_host_srv_count           25192 non-null  int64  
 33  dst_host_same_srv_rate       25192 non-null  float64
 34  dst_host_diff_srv_rate       25192 non-null  float64
 35  dst_host_same_src_port_rate  25192 non-null  float64
 36  dst_host_srv_diff_host_rate  25192 non-null  float64
 37  dst_host_serror_rate         25192 non-null  float64
 38  dst_host_srv_serror_rate     25192 non-null  float64
 39  dst_host_rerror_rate         25192 non-null  float64
 40  dst_host_srv_rerror_rate     25192 non-null  float64
 41  class                        25192 non-null  object 
dtypes: float64(15), int64(23), object(4)
memory usage: 8.1+ MB
Out[5]:
(25192, 42)

Data Types:

The 'duration' column is currently of type int64, indicating it contains integer values.

'protocol_type', 'service', and 'flag' columns are of type object, suggesting categorical information.

Columns like 'src_bytes', 'dst_bytes', 'land', 'wrong_fragment', 'urgent', 'hot', and others are of type int64, representing numerical data.

Missing Values:

No missing values are reported for any column. All columns have a non-null count of 25192, indicating completeness.

Textual Data:

No explicit information is provided about columns containing textual data. However, based on common practices, columns like 'Payload Data', 'User Information', and 'Device Information' might contain textual information.

DataFrame Shape:

The DataFrame has a shape of (25192, 42), indicating it consists of 25,192 rows and 42 columns.

Class Distribution:

The 'class' column indicates the label for each entry, and there are no missing values. Further exploration is needed to understand the distribution of classes ('anomaly' and 'normal').

In [6]:
data['protocol_type'].value_counts()
Out[6]:
protocol_type
tcp     20526
udp      3011
icmp     1655
Name: count, dtype: int64
In [7]:
data['protocol_type'].unique()
Out[7]:
array(['tcp', 'udp', 'icmp'], dtype=object)

Protocol Type Summary:

The 'protocol_type' column in the dataset consists of three unique values: 'tcp', 'udp', and 'icmp'.

  • 'tcp': 20,526 entries
  • 'udp': 3,011 entries
  • 'icmp': 1,655 entries
In [8]:
data['service'].value_counts()
Out[8]:
service
http         8003
private      4351
domain_u     1820
smtp         1449
ftp_data     1396
             ... 
urh_i           4
red_i           3
pm_dump         3
tim_i           2
http_8001       1
Name: count, Length: 66, dtype: int64
In [9]:
data['service'].unique()
Out[9]:
array(['ftp_data', 'other', 'private', 'http', 'remote_job', 'name',
       'netbios_ns', 'eco_i', 'mtp', 'telnet', 'finger', 'domain_u',
       'supdup', 'uucp_path', 'Z39_50', 'smtp', 'csnet_ns', 'uucp',
       'netbios_dgm', 'urp_i', 'auth', 'domain', 'ftp', 'bgp', 'ldap',
       'ecr_i', 'gopher', 'vmnet', 'systat', 'http_443', 'efs', 'whois',
       'imap4', 'iso_tsap', 'echo', 'klogin', 'link', 'sunrpc', 'login',
       'kshell', 'sql_net', 'time', 'hostnames', 'exec', 'ntp_u',
       'discard', 'nntp', 'courier', 'ctf', 'ssh', 'daytime', 'shell',
       'netstat', 'pop_3', 'nnsp', 'IRC', 'pop_2', 'printer', 'tim_i',
       'pm_dump', 'red_i', 'netbios_ssn', 'rje', 'X11', 'urh_i',
       'http_8001'], dtype=object)

Service Summary:

The 'service' column in the dataset consists of 66 unique val and the most frequent is http.es.<->

In [10]:
data['flag'].value_counts()
Out[10]:
flag
SF        14973
S0         7009
REJ        2216
RSTR        497
RSTO        304
S1           88
SH           43
RSTOS0       21
S2           21
S3           15
OTH           5
Name: count, dtype: int64
In [11]:
data['flag'].unique()
Out[11]:
array(['SF', 'S0', 'REJ', 'RSTR', 'SH', 'RSTO', 'S1', 'RSTOS0', 'S3',
       'S2', 'OTH'], dtype=object)

Flag Summary:

The 'flag' column in the dataset consists of 11 unique values.

  • 'SF': 14,973 entries
  • 'S0': 7,009 entries
  • 'REJ': 2,216 entries
  • 'RSTR': 497 entries
  • 'RSTO': 304 entries
  • 'S1': 88 entries
  • 'SH': 43 entries
  • 'RSTOS0': 21 entries
  • 'S2': 21 entries
  • 'S3': 15 entries
  • 'OTH': 5 entries
In [12]:
data['class'].value_counts()
Out[12]:
class
normal     13449
anomaly    11743
Name: count, dtype: int64
In [13]:
data['class'].unique()
Out[13]:
array(['normal', 'anomaly'], dtype=object)

Class Distribution:

The 'class' column contains two unique values: 'normal' and 'anomaly'.

The distribution of classes is as follows:

  • Normal: 13,449 instances
  • Anomaly: 11,743 instances

Data Cleaning 🌟

In the grand tale of data cleaning, I wield the mighty SimpleImputer from the sacred pipeline, employing the ancient strategy of the wise median to cast away any lurking NaN creatures. Fear not, for every missing value shall be replaced, and the dataset shall emerge purified!

In [14]:
data.isnull().sum()
Out[14]:
duration                       0
protocol_type                  0
service                        0
flag                           0
src_bytes                      0
dst_bytes                      0
land                           0
wrong_fragment                 0
urgent                         0
hot                            0
num_failed_logins              0
logged_in                      0
num_compromised                0
root_shell                     0
su_attempted                   0
num_root                       0
num_file_creations             0
num_shells                     0
num_access_files               0
num_outbound_cmds              0
is_host_login                  0
is_guest_login                 0
count                          0
srv_count                      0
serror_rate                    0
srv_serror_rate                0
rerror_rate                    0
srv_rerror_rate                0
same_srv_rate                  0
diff_srv_rate                  0
srv_diff_host_rate             0
dst_host_count                 0
dst_host_srv_count             0
dst_host_same_srv_rate         0
dst_host_diff_srv_rate         0
dst_host_same_src_port_rate    0
dst_host_srv_diff_host_rate    0
dst_host_serror_rate           0
dst_host_srv_serror_rate       0
dst_host_rerror_rate           0
dst_host_srv_rerror_rate       0
class                          0
dtype: int64

Oh actually, Great!

It seems like there are no missing values in any of the columns. This is good news for your analysis, as having complete data allows for more accurate insights. If you have any specific actions or analyses you'd like to perform with this information, feel free to let me know!

🛠️ Feature Engineering

This section explores feature engineering techniques.

In [15]:
data['flagged_connection'] = data['flag'] + '_' + data['logged_in'].astype(str)
data['connection_duration'] = data['count'] - data['srv_count']

Flagged Connections:

Combine the flag and logged_in features to create a new categorical feature indicating whether the connection was flagged or not.

Connection Duration:

Introduce a feature representing the duration of the connection by subtracting srv_count from count. This can provide insights into the duration of services.

In [16]:
data['traffic_density'] = data['count'] * data['dst_host_srv_count']
data['high_priority'] = (data['urgent'] + data['hot']).apply(lambda x: 'Yes' if x > 0 else 'No')

Traffic Density:

Create a feature representing the overall traffic density by combining count and dst_host_srv_count.

High Priority Connections:

Combine urgent and hot features to create a new binary feature indicating whether the connection has high priority.

In [17]:
data['connection_stability'] = (data['same_srv_rate'] - data['diff_srv_rate']).abs()
data['host_similarity'] = data['same_srv_rate'] - data['dst_host_same_srv_rate']

Connection Stability:

Introduce a feature that reflects the stability of the connection by considering the ratio of successful connections to the total number of connections.

Host Similarity:

Compute a feature indicating how similar the host's services are to the average services on the destination host.

In [18]:
data['network_error_rate'] = (data['serror_rate'] + data['rerror_rate'] +
                            data['dst_host_serror_rate'] + data['dst_host_rerror_rate']) / 4

data['connection_diversity'] = data['diff_srv_rate'] / (data['srv_diff_host_rate'] + 1)
data['abnormal_behavior'] = (data['num_failed_logins'] + data['num_compromised'] +
                           data['num_root'] + data['num_file_creations'] +
                           data['num_shells'] + data['num_access_files'])

Network Error Rate:

Combine the error rates for source and destination to create a new feature indicating the overall network error rate.

Connection Diversity:

Create a feature representing the diversity of connections by considering the ratio of diff_srv_rate to srv_diff_host_rate.

Abnormal Behavior:

Identify abnormal behavior by combining various features related to abnormal activities.

In [19]:
data.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 25192 entries, 0 to 25191
Data columns (total 51 columns):
 #   Column                       Non-Null Count  Dtype  
---  ------                       --------------  -----  
 0   duration                     25192 non-null  int64  
 1   protocol_type                25192 non-null  object 
 2   service                      25192 non-null  object 
 3   flag                         25192 non-null  object 
 4   src_bytes                    25192 non-null  int64  
 5   dst_bytes                    25192 non-null  int64  
 6   land                         25192 non-null  int64  
 7   wrong_fragment               25192 non-null  int64  
 8   urgent                       25192 non-null  int64  
 9   hot                          25192 non-null  int64  
 10  num_failed_logins            25192 non-null  int64  
 11  logged_in                    25192 non-null  int64  
 12  num_compromised              25192 non-null  int64  
 13  root_shell                   25192 non-null  int64  
 14  su_attempted                 25192 non-null  int64  
 15  num_root                     25192 non-null  int64  
 16  num_file_creations           25192 non-null  int64  
 17  num_shells                   25192 non-null  int64  
 18  num_access_files             25192 non-null  int64  
 19  num_outbound_cmds            25192 non-null  int64  
 20  is_host_login                25192 non-null  int64  
 21  is_guest_login               25192 non-null  int64  
 22  count                        25192 non-null  int64  
 23  srv_count                    25192 non-null  int64  
 24  serror_rate                  25192 non-null  float64
 25  srv_serror_rate              25192 non-null  float64
 26  rerror_rate                  25192 non-null  float64
 27  srv_rerror_rate              25192 non-null  float64
 28  same_srv_rate                25192 non-null  float64
 29  diff_srv_rate                25192 non-null  float64
 30  srv_diff_host_rate           25192 non-null  float64
 31  dst_host_count               25192 non-null  int64  
 32  dst_host_srv_count           25192 non-null  int64  
 33  dst_host_same_srv_rate       25192 non-null  float64
 34  dst_host_diff_srv_rate       25192 non-null  float64
 35  dst_host_same_src_port_rate  25192 non-null  float64
 36  dst_host_srv_diff_host_rate  25192 non-null  float64
 37  dst_host_serror_rate         25192 non-null  float64
 38  dst_host_srv_serror_rate     25192 non-null  float64
 39  dst_host_rerror_rate         25192 non-null  float64
 40  dst_host_srv_rerror_rate     25192 non-null  float64
 41  class                        25192 non-null  object 
 42  flagged_connection           25192 non-null  object 
 43  connection_duration          25192 non-null  int64  
 44  traffic_density              25192 non-null  int64  
 45  high_priority                25192 non-null  object 
 46  connection_stability         25192 non-null  float64
 47  host_similarity              25192 non-null  float64
 48  network_error_rate           25192 non-null  float64
 49  connection_diversity         25192 non-null  float64
 50  abnormal_behavior            25192 non-null  int64  
dtypes: float64(19), int64(26), object(6)
memory usage: 9.8+ MB

📊 Visualization

This section focuses on visualizing the data .

The visualization aims to provide insights into the distribution of classes (normal or anomaly) for network connections that satisfy the specified criteria, including TCP protocol, 'S0' flag, high anomaly, and high priority.

In [22]:
fig = px.scatter(data,
                 x='num_failed_logins',
                 y='num_compromised',
                 color='abnormal_behavior',
                 size='abnormal_behavior',
                 hover_data=['service', 'flag'],  # Add more information for hover tooltips
                 title='Identification of Abnormal Behavior',
                 labels={'num_failed_logins': 'Number of Failed Logins',
                         'num_compromised': 'Number of Compromised Instances',
                         'abnormal_behavior': 'Abnormal Behavior Intensity'})

fig.update_layout(
    xaxis_title='Number of Failed Logins',
    yaxis_title='Number of Compromised Instances',
    coloraxis_colorbar=dict(title='Abnormal Behavior Intensity')
)

fig.show()

the scatter plot visually represents the relationship between the number of failed logins (x), the number of compromised instances (y), and abnormal behavior intensity (color and size). It provides a quick overview of the dataset, helping identify patterns and outliers related to abnormal behavior, with additional details available in hover tooltips.

In [23]:
box_fig = px.box(data, x='service', y='dst_bytes', color='abnormal_behavior',
                 title='Box Plot: Service vs. Destination Bytes',
                 labels={'service': 'Service', 'dst_bytes': 'Destination Bytes'})
box_fig.show()

the box plot visually represents the distribution of 'Destination Bytes' for different 'Service' categories. The boxes are colored based on the 'Abnormal Behavior' column, allowing for a quick visual assessment of how abnormal behavior relates to the distribution of destination bytes across different services.

In [24]:
scatter_fig = px.scatter(data, x='duration', y='src_bytes', color='abnormal_behavior',
                         title='Scatter Plot: Duration vs. Source Bytes',
                         labels={'duration': 'Duration', 'src_bytes': 'Source Bytes'})
scatter_fig.show()

the scatter plot visually represents the relationship between 'Duration' and 'Source Bytes', with each point colored based on the 'Abnormal Behavior' column. This plot provides insights into the distribution and potential patterns in the dataset concerning the duration of events and the amount of source bytes transferred.

In [27]:
fig_violin_flagged = px.violin(data, x='flagged_connection', y='connection_duration', box=True, points="all",
                               title='Violin Plot: Connection Duration by Flagged Connection',
                               labels={'flagged_connection': 'Flagged Connection', 'connection_duration': 'Connection Duration'})
fig_violin_flagged.show()

The plot helps analyze how the connection duration varies for different flagged connections. It provides insights into the central tendency, spread, and shape of the distribution within each flagged connection category. Outliers and patterns in the distribution can be identified.

Correlation

it assesses the strength and direction of a linear relationship between two numeric variables.

In [28]:
# Assuming your DataFrame is named 'data'
categorical_columns = ['protocol_type', 'service', 'flag', 'class', 'flagged_connection', 'high_priority']
data_corr = data.copy()

# Apply label encoding to each categorical column
label_encoder = LabelEncoder()
for column in categorical_columns:
    data_corr[column] = label_encoder.fit_transform(data_corr[column])

# Split features into groups of ten
feature_groups = [data_corr.columns[i:i+9].tolist() + ['class'] for i in range(0, len(data_corr.columns)-1, 9)]  # Exclude the last column 'abnormal_behavior'

# Visualize the correlation matrix for each group
for group in feature_groups:
    corr_group = data_corr[group].corr()
    plt.figure(figsize=(12, 8))
    sns.heatmap(corr_group, annot=True, fmt=".2f", cmap="RdBu")
    plt.show()
In [29]:
corr = data_corr.corr()

fig, ax = plt.subplots(1, 1, figsize=(15, 15))

target_corr = corr['class'].drop('class')

target_corr_sorted = target_corr.sort_values(ascending=False)

sns.heatmap(target_corr_sorted.to_frame(), cmap="RdBu", annot=True, fmt='.2f', cbar=False, ax=ax)
ax.set_title('Correlation with Attack Type')

plt.tight_layout()
plt.show()

🛡️ Intrusion Detection Features Analysis 🕵️‍♂️

Unlocking insights to fortify your defenses!

Key Features:

flag: 0.651309 (Strong positive correlation with intrusion)
logged_in: 0.688084 (Strong positive correlation with intrusion)
same_srv_rate: 0.749237 (Strong positive correlation with intrusion)
dst_host_srv_count: 0.719292 (Strong positive correlation with intrusion)
dst_host_same_srv_rate: 0.692212 (Strong positive correlation with intrusion)

Connection Stability:

connection_stability: 0.721484 (Strong positive correlation with intrusion)

Network Error Rate:

network_error_rate: -0.777671 (Strong negative correlation with intrusion)

Inverse Correlations:

Features related to error rates (serror_rate, srv_serror_rate, rerror_rate, srv_rerror_rate, dst_host_serror_rate, dst_host_srv_serror_rate, dst_host_rerror_rate, dst_host_srv_rerror_rate) are negatively correlated with the target variable. This suggests that higher error rates are associated with a higher likelihood of intrusion.

Connection Duration and Count:

connection_duration: -0.643156 (Strong negative correlation with intrusion)
count: -0.578790 (Moderate negative correlation with intrusion)

Traffic Density:

traffic_density: 0.042092 (Weak positive correlation with intrusion)

Additional Features:

srv_diff_host_rate: 0.120649 (Moderate positive correlation with intrusion)
high_priority: -0.026084 (Weak negative correlation with intrusion)

NaN Correlations:

Features like num_outbound_cmds and is_host_login have NaN correlations. These features might not provide useful information or could be constant in the dataset.

Building an effective Intrusion Detection System involves more than correlation analysis. Consider additional exploratory data analysis (EDA), feature engineering, and machine learning algorithms for classification. Remember the trade-off between false positives and false negatives when setting up your IDS. 🛡️

</div>

Outliers

This section deals with identifying and handling outliers.

In [30]:
selected_cols = ['logged_in', 'count', 'srv_count', 'serror_rate', 'srv_serror_rate', 'same_srv_rate',
                 'diff_srv_rate', 'dst_host_count', 'dst_host_diff_srv_rate', 'dst_host_srv_rerror_rate',
                 'connection_duration', 'host_similarity', 'connection_diversity']

# Create subplots
fig, axes = plt.subplots(nrows=4, ncols=4, figsize=(16, 16))

# Flatten the axes array for easy iteration
axes = axes.flatten()

# Iterate over selected columns and create boxplots
for i, column in enumerate(selected_cols):
    sns.boxplot(y=data[column], ax=axes[i])
    axes[i].set_title(f'Boxplot for {column}')

# Remove empty subplots
for i in range(len(selected_cols), len(axes)):
    fig.delaxes(axes[i])

plt.tight_layout()
plt.show()
# Example: Log transformation
data_log_transformed = np.log1p(data[selected_cols])

Duplicates

This section addresses the presence of duplicate data.

In [31]:
duplicate_count = data.duplicated().sum()

print("Number of duplicates:", duplicate_count)
Number of duplicates: 0

Data Splitting

perform a stratified shuffle split, ensuring that the distribution of the "normal or anomaly" variable is maintained in both the training and testing sets.

In [32]:
split=StratifiedShuffleSplit(n_splits=1 ,test_size=0.15 , random_state=42)
for train_index, test_index in split.split(data, data["class"]):
    strat_train_set = data.iloc[train_index]
    strat_test_set = data.iloc[test_index]
In [33]:
x_train=strat_train_set.drop('class',axis=1)
y_train=strat_train_set['class'].copy()

x_test=strat_test_set.drop('class',axis=1)
y_test=strat_test_set['class'].copy()

x_train_shape = x_train.shape
x_test_shape = x_test.shape

# Plotting the shapes
fig, ax = plt.subplots(figsize=(8, 5))

ax.bar(['x_train', 'x_test'], [x_train_shape[0], x_test_shape[0]], color=['blue', 'green'])
ax.set_ylabel('Number of Samples')
ax.set_title('Shape of x_train and x_test')

plt.show()

Custom Transformer

This custom transformer, CustomCategoricalTransformer, takes two lists of features (ordinal_features and onehot_features) and performs ordinal encoding on the specified ordinal features and one-hot encoding on the specified one-hot features. It then concatenates the results into a single DataFrame.

In [34]:
from scipy.sparse import hstack

class CustomCategoricalTransformer(BaseEstimator, TransformerMixin):
    def __init__(self, ordinal_features, onehot_features):
        self.ordinal_features = ordinal_features
        self.onehot_features = onehot_features
        self.ordinal_encoder = OrdinalEncoder()
        self.onehot_encoder = OneHotEncoder()

    def fit(self, X, y=None):
        ordinal_data = X[self.ordinal_features]
        onehot_data = X[self.onehot_features]

        self.ordinal_encoder.fit(ordinal_data)
        self.onehot_encoder.fit(onehot_data)

        return self

    def transform(self, X):
        ordinal_data = X[self.ordinal_features]
        onehot_data = X[self.onehot_features]

        ordinal_encoded = self.ordinal_encoder.transform(ordinal_data)
        onehot_encoded = self.onehot_encoder.transform(onehot_data)

        # Combine the encoded features as a sparse matrix
        transformed_data = hstack([ordinal_encoded, onehot_encoded])

        # Convert to DataFrame with column names
        transformed_df = pd.DataFrame(transformed_data.toarray().astype(np.float32), columns=self.get_feature_names_out(X.columns))

        return transformed_df

    def get_feature_names_out(self, input_features=None):
        ordinal_names = self.ordinal_encoder.get_feature_names_out(self.ordinal_features)
        onehot_names = self.onehot_encoder.get_feature_names_out(self.onehot_features)

        return np.concatenate((ordinal_names, onehot_names), axis=0)

Data Preprocessing

This section covers the overall data preprocessing steps.

In [35]:
# Assuming your DataFrame is named 'data'
categorical_columns =  x_train.select_dtypes(include=['object']).columns.tolist()
numeric_columns = x_train.select_dtypes(include=['int64', 'float64']).columns.tolist()

# Separate transformers for numerical and categorical columns
numeric_transformer = Pipeline(steps=[
    ('imputer', SimpleImputer(strategy='mean')),
    ('scaler', StandardScaler())
])

categorical_transformer = Pipeline(steps=[
    ('imputer', SimpleImputer(strategy='most_frequent')),
    ('encoder', OrdinalEncoder(handle_unknown='use_encoded_value', unknown_value=-1))
])

# Apply transformers to columns
preprocessor = ColumnTransformer(
    transformers=[
        ('num', numeric_transformer, numeric_columns),
        ('cat', categorical_transformer, categorical_columns)
    ])


x_train_pd=preprocessor.fit_transform(x_train)
x_test_pd=preprocessor.transform(x_test)

y_train_pd=y_train.map({'normal':0, 'anomaly':1})
y_test_pd=y_test.map({'normal':0 ,'anomaly':1})

x_train_pd=pd.DataFrame(x_train_pd,columns=preprocessor.get_feature_names_out())

x_train_pd
Out[35]:
num__duration num__src_bytes num__dst_bytes num__land num__wrong_fragment num__urgent num__hot num__num_failed_logins num__logged_in num__num_compromised ... num__connection_stability num__host_similarity num__network_error_rate num__connection_diversity num__abnormal_behavior cat__protocol_type cat__service cat__flag cat__flagged_connection cat__high_priority
0 -0.112590 -0.010278 -0.019938 -0.009665 -0.089518 -0.006834 -0.092169 -0.025654 1.233511 -0.021243 ... 0.750739 -0.484164 -0.849292 -0.350608 -0.022093 1.0 22.0 9.0 14.0 0.0
1 -0.112590 -0.010369 -0.039789 -0.009665 -0.089518 -0.006834 -0.092169 -0.025654 -0.810694 -0.021243 ... 0.750739 -0.484164 -0.849292 -0.350608 -0.022093 0.0 13.0 9.0 13.0 0.0
2 -0.112590 -0.010366 -0.039789 -0.009665 -0.089518 -0.006834 -0.092169 -0.025654 -0.810694 -0.021243 ... 0.750739 -0.484164 -0.849292 -0.350608 -0.022093 0.0 13.0 9.0 13.0 0.0
3 -0.112590 -0.010369 -0.039789 -0.009665 -0.089518 -0.006834 -0.092169 -0.025654 -0.810694 -0.021243 ... 0.750739 -0.484164 -0.849292 -0.350608 -0.022093 0.0 13.0 9.0 13.0 0.0
4 -0.112590 -0.010373 -0.039789 -0.009665 -0.089518 -0.006834 -0.092169 -0.025654 -0.810694 -0.021243 ... 0.750739 2.964577 0.420367 -0.350608 -0.022093 1.0 46.0 4.0 5.0 0.0
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
21408 0.854795 -0.010317 -0.038697 -0.009665 -0.089518 -0.006834 -0.092169 -0.025654 -0.810694 -0.021243 ... 0.750739 2.964577 -0.849292 -0.350608 -0.022093 2.0 41.0 9.0 13.0 0.0
21409 -0.112590 -0.010070 -0.035704 -0.009665 -0.089518 -0.006834 -0.092169 -0.025654 1.233511 -0.021243 ... 0.750739 0.412509 -0.849292 -0.350608 -0.022093 1.0 51.0 9.0 14.0 0.0
21410 -0.112590 -0.010356 -0.039020 -0.009665 -0.089518 -0.006834 -0.092169 -0.025654 -0.810694 -0.021243 ... 0.750739 -0.484164 -0.849292 -0.350608 -0.022093 2.0 11.0 9.0 13.0 0.0
21411 -0.111828 -0.010373 -0.039789 -0.009665 -0.089518 -0.006834 -0.092169 -0.025654 -0.810694 -0.021243 ... -0.346036 -0.484164 1.249317 2.585572 -0.022093 1.0 46.0 4.0 5.0 0.0
21412 -0.112590 -0.010373 -0.039789 -0.009665 -0.089518 -0.006834 -0.092169 -0.025654 -0.810694 -0.021243 ... 0.750739 2.964577 1.123401 -0.350608 -0.022093 1.0 46.0 1.0 1.0 0.0

21413 rows × 50 columns

The code sets up a comprehensive preprocessing pipeline to handle missing values, scale numeric features, and encode categorical features, preparing the data for training a machine learning model.

🤖 Machine Learning Modeling

Introduction

Project Summary:

In this task, I conducted five experiments, each using four different classification algorithms on a traffic dataset. After training and evaluating each algorithm, comparisons were made based on accuracy, precision, recall, F1 score, average metrics, and macro metrics. The best-performing algorithm from each experiment was selected.

Pre-Defined Usefull Function for Visualization

These functions can be used to visualize key aspects of classification results, such as confusion matrices, plot_classification_report,plot_precision_recall_f1_per_class,heatmaps and learning curves. Adjust the parameters as needed for your specific use case.

In [36]:
def plot_confusion_matrices(cvp,t_labels):
    labels=['Normal', 'Anomaly']
    fig, axs = plt.subplots(nrows=1, ncols=2, figsize=(16, 5))
    plt.rc('font', size=9)

    # Confusion Matrix
    disp = ConfusionMatrixDisplay.from_predictions(t_labels, cvp, display_labels=labels, ax=axs[0], cmap='YlGnBu')
    axs[0].set_title("Confusion matrix")

    # Normalized Confusion Matrix
    disp_norm = ConfusionMatrixDisplay.from_predictions(t_labels, cvp, display_labels=labels,
                                                      ax=axs[1], normalize="true", values_format=".0%",cmap='YlGnBu')
    axs[1].set_title("CM normalized by row")

    plt.subplots_adjust(wspace=0.5)
    plt.show()
In [37]:
def plot_classification_report(cvp, name_of_classifier, labels_prepared):
    # Generate the classification report
    report = classification_report(labels_prepared, cvp, target_names=name_of_classifier, output_dict=True)
    # Convert the classification report to a DataFrame for better visualization
    report_df = pd.DataFrame(report).transpose()
    # Plotting
    plt.figure(figsize=(15, 8))
    # Plot the heatmap
    sns.heatmap(report_df.iloc[:-1, :-1].astype(float), annot=True, cmap='RdBu', cbar=False)
    plt.title(' Classification Report Heatmap')
    plt.show()
In [38]:
def plot_precision_recall_f1_per_class(cvp,y_train_pd):
    precision_per_class = precision_score(y_train_pd, cvp, average=None)
    recall_per_class = recall_score(y_train_pd, cvp, average=None)
    f1_per_class = f1_score(y_train_pd, cvp, average=None)

# Bar plot
    fig, ax = plt.subplots(figsize=(15, 6))
    labels=['Normal', 'Anomaly'] 
    bar_width = 0.2
    index = np.arange(len(labels))

    bar1 = ax.bar(index, precision_per_class, bar_width, label='Precision')
    bar2 = ax.bar(index + bar_width, recall_per_class, bar_width, label='Recall')
    bar3 = ax.bar(index + 2 * bar_width, f1_per_class, bar_width, label='F1-Score')

    ax.set_xlabel('Classes')
    ax.set_ylabel('Scores')
    ax.set_title('Precision, Recall, and F1-Score by Class')
    ax.set_xticks(index + bar_width)
    ax.set_xticklabels(labels)
    ax.legend()

    plt.show()
In [39]:
def plot_classification_reports_heatmaps(predicted_labels_list, classifier_names,lables_prepared):
    labels=['Normal', 'Anomaly']  # Adjust based on your actual class labels
    num_classifiers = len(predicted_labels_list)
    num_rows = num_classifiers // 2
    num_cols = 2 if num_classifiers % 2 == 0 else 1  # Use 1 column for the last row if the number of classifiers is odd

    # Create subplots
    fig, axs = plt.subplots(nrows=num_rows, ncols=num_cols, figsize=(15, 12))

    # Flatten the subplot array
    axs_flat = axs.flatten()

    # Iterate through classifiers
    for i in range(num_classifiers):
        # Calculate the classification report
        report = classification_report(lables_prepared, predicted_labels_list[i], target_names=labels, output_dict=True)
        report_df = pd.DataFrame(report).transpose()

        # Create a heatmap
        sns.heatmap(report_df.iloc[:-1, :-1], annot=True, cmap='RdBu', cbar=False, ax=axs_flat[i])
        axs_flat[i].set_title(f'{classifier_names[i]} Classification Report Heatmap')

    # Adjust layout and display the heatmaps
    plt.subplots_adjust(wspace=0.2, hspace=0.2)
    plt.show()
In [40]:
def plot_classification_reports_heatmaps_2(predicted_labels_list, classifier_names, true_labels):
    labels=['Normal', 'Anomaly']
    num_classifiers = len(predicted_labels_list)

    num_rows = num_classifiers // 2 + num_classifiers % 2  # Adjust for odd number of classifiers

    fig, axs = plt.subplots(nrows=num_rows, ncols=2, figsize=(15, 6 * num_rows))

    for i, ax in enumerate(axs.flatten()):
        if i < num_classifiers:
            report = classification_report(true_labels, predicted_labels_list[i], target_names=labels, output_dict=True)
            report_df = pd.DataFrame(report).transpose()

            sb.heatmap(report_df.iloc[:-1, :-1], annot=True, cmap='RdBu', cbar=False, ax=ax)
            ax.set_title(f'{classifier_names[i]} Classification Report Heatmap')
        else:
            fig.delaxes(ax)

    plt.subplots_adjust(wspace=0.2, hspace=0.4)  # Adjust spacing
    plt.show()
In [41]:
def plot_multiclass_precision_recall_curve(classifier):
    labels = ['Normal', 'Anomaly']
    
    # Binarize the labels
    y_bin = label_binarize(y_train_pd, classes=list(set(y_train_pd)))

    # Train the classifier and get decision function scores
    y_score = classifier.fit(x_train_pd, y_train_pd).decision_function(x_train_pd)

    # Compute precision-recall pairs for each class
    precision = dict()
    recall = dict()
    for i in range(len(set(y_train_pd))):
        precision[i], recall[i], _ = precision_recall_curve(y_bin, y_score[:, i])
        plt.plot(recall[i], precision[i], label=labels[i])

    plt.xlabel('Recall')
    plt.ylabel('Precision')
    plt.title('Multiclass Precision-Recall Curve')
    plt.legend()
    plt.show()
In [42]:
def plot_comparison(train_predictions, train_labels, test_predictions, test_labels, labels):
    # Calculate metrics for training set
    train_accuracy = accuracy_score(train_labels, train_predictions)
    train_precision = precision_score(train_labels, train_predictions, average='weighted')
    train_recall = recall_score(train_labels, train_predictions, average='weighted')
    train_f1 = f1_score(train_labels, train_predictions, average='weighted')

    # Calculate metrics for test set
    test_accuracy = accuracy_score(test_labels, test_predictions)
    test_precision = precision_score(test_labels, test_predictions, average='weighted')
    test_recall = recall_score(test_labels, test_predictions, average='weighted')
    test_f1 = f1_score(test_labels, test_predictions, average='weighted')

    # Prepare data for the plot
    metrics = ['Accuracy', 'Precision', 'Recall', 'F1 Score']
    train_values = [train_accuracy, train_precision, train_recall, train_f1]
    test_values = [test_accuracy, test_precision, test_recall, test_f1]

    # Create a DataFrame for visualization
    df = pd.DataFrame({'Metric': metrics * 2, 'Value': train_values + test_values, 'Set': ['Train'] * 4 + ['Test'] * 4})

    # Plotting
    plt.figure(figsize=(15, 6))
    sns.barplot(x='Metric', y='Value', hue='Set', data=df, palette='Set2')
    plt.title('Model Performance Comparison (Train vs Test)')
    plt.show()
In [43]:
def plot_comparison(train_predictions, train_labels, test_predictions, test_labels, labels):
    # Calculate metrics for training set
    train_accuracy = accuracy_score(train_labels, train_predictions)
    train_precision = precision_score(train_labels, train_predictions, average='weighted')
    train_recall = recall_score(train_labels, train_predictions, average='weighted')
    train_f1 = f1_score(train_labels, train_predictions, average='weighted')

    # Calculate metrics for test set
    test_accuracy = accuracy_score(test_labels, test_predictions)
    test_precision = precision_score(test_labels, test_predictions, average='weighted')
    test_recall = recall_score(test_labels, test_predictions, average='weighted')
    test_f1 = f1_score(test_labels, test_predictions, average='weighted')

    # Prepare data for the plot
    metrics = ['Accuracy', 'Precision', 'Recall', 'F1 Score']
    train_values = [train_accuracy, train_precision, train_recall, train_f1]
    test_values = [test_accuracy, test_precision, test_recall, test_f1]

    # Create a DataFrame for visualization
    df = pd.DataFrame({'Metric': metrics * 2, 'Value': train_values + test_values, 'Set': ['Train'] * 4 + ['Test'] * 4})

    # Plotting
    plt.figure(figsize=(15, 6))
    sns.barplot(x='Metric', y='Value', hue='Set', data=df, palette='Set2')
    plt.title('Model Performance Comparison (Train vs Test)')
    plt.show()
In [44]:
def precisions_recalls_thresholds(cvp,labels):

    threshold = 0

    precisions, recalls, thresholds = precision_recall_curve(labels,cvp)
    plt.figure(figsize=(8, 4))  
    plt.plot(thresholds, precisions[:-1], "b--", label="Precision", linewidth=2)
    plt.plot(thresholds, recalls[:-1], "g-", label="Recall", linewidth=2)
    plt.vlines(threshold, 0, 1.0, "k", "dotted", label="threshold")

    idx = (thresholds >= threshold).argmax()  
    plt.plot(thresholds[idx], precisions[idx], "bo")
    plt.plot(thresholds[idx], recalls[idx], "go")
    plt.axis([thresholds.min(), thresholds.max(), 0, 1])
    plt.grid()
    plt.xlabel("Threshold")
    plt.legend(loc="center right")

Dummy Classifier

Highlights the minimal performance achievable with a simplistic approach.

In [45]:
dummy_clf = DummyClassifier(strategy='stratified')
dummy_cvd_pred= cross_val_predict(dummy_clf, x_train_pd, y_train_pd, cv=3)
plot_confusion_matrices(dummy_cvd_pred,y_train_pd)
In [46]:
dummy_a=accuracy_score(y_train_pd,dummy_cvd_pred)
dummy_a
Out[46]:
0.5072152430766357

This Dummy Classifier is intentionally simplistic, predicting the most frequent class without considering the features. It provides a minimal benchmark against which other models can be compared.

LogisticRegression

This code snippet involves the instantiation and cross-validated predictions of LogisticRegression,the classifier is configured with specific parameters to evaluate their performance.

In [47]:
log_reg = LogisticRegression(random_state=42)
log_param = [{'C' : [1,10,50] , 'max_iter' : [500,1000,2000]}]
grid_search = GridSearchCV(log_reg,log_param,cv=3,scoring='accuracy')
grid_search.fit(x_train_pd,y_train_pd)
Out[47]:
GridSearchCV(cv=3, estimator=LogisticRegression(random_state=42),
             param_grid=[{'C': [1, 10, 50], 'max_iter': [500, 1000, 2000]}],
             scoring='accuracy')
In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.
GridSearchCV(cv=3, estimator=LogisticRegression(random_state=42),
             param_grid=[{'C': [1, 10, 50], 'max_iter': [500, 1000, 2000]}],
             scoring='accuracy')
LogisticRegression(random_state=42)
LogisticRegression(random_state=42)
In [48]:
log_reg=grid_search.best_estimator_
grid_search.best_params_
Out[48]:
{'C': 10, 'max_iter': 500}
In [49]:
log_cvp= cross_val_predict(log_reg, x_train_pd, y_train_pd, cv=3)
plot_confusion_matrices(log_cvp,y_train_pd)

generating confusion matrices to illustrate how well the model performs on the training data. By visualizing the distribution of predicted and actual labels, it provides insights into the model's accuracy and its ability to correctly classify different classes

In [50]:
plot_classification_report(log_cvp, ['normal', 'anomaly'], y_train_pd)
In [51]:
plot_precision_recall_f1_per_class(log_cvp,y_train_pd)

Showing the precision, reacall and F1 scores for the model on the test set for each class (normal,anomaly)

KNeighborsClassifier

This code snippet involves the instantiation and cross-validated predictions ofKNeighborsClassifiern,the classifier is configured with specific parameters to evaluate their performance.

In [52]:
knn_clf = KNeighborsClassifier()
knn_param = [{'n_neighbors':[3,5,8,12],'weights':['uniform','distance'],
              'algorithm':['auto','ball_tree','kd_tree','brute']
             }]
grid_search4 = GridSearchCV(knn_clf,knn_param,cv=3,scoring='accuracy')
grid_search4.fit(x_train_pd,y_train_pd)
Out[52]:
GridSearchCV(cv=3, estimator=KNeighborsClassifier(),
             param_grid=[{'algorithm': ['auto', 'ball_tree', 'kd_tree',
                                        'brute'],
                          'n_neighbors': [3, 5, 8, 12],
                          'weights': ['uniform', 'distance']}],
             scoring='accuracy')
In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.
GridSearchCV(cv=3, estimator=KNeighborsClassifier(),
             param_grid=[{'algorithm': ['auto', 'ball_tree', 'kd_tree',
                                        'brute'],
                          'n_neighbors': [3, 5, 8, 12],
                          'weights': ['uniform', 'distance']}],
             scoring='accuracy')
KNeighborsClassifier()
KNeighborsClassifier()
In [53]:
knc = grid_search4.best_estimator_
grid_search4.best_params_
Out[53]:
{'algorithm': 'auto', 'n_neighbors': 3, 'weights': 'distance'}
In [54]:
knc_cvp= cross_val_predict(knc, x_train_pd, y_train_pd, cv=3)
plot_confusion_matrices(knc_cvp,y_train_pd)

generating confusion matrices to illustrate how well the model performs on the training data. By visualizing the distribution of predicted and actual labels, it provides insights into the model's accuracy and its ability to correctly classify different classes

In [55]:
plot_classification_report(knc_cvp, ['normal', 'anomaly'], y_train_pd)
In [56]:
plot_precision_recall_f1_per_class(knc_cvp,y_train_pd)

Showing the precision, reacall and F1 scores for the model on the test set for each class (normal,anomaly)

Support Vector Classifier

This code snippet focuses on the instantiation and cross-validated predictions Support Vector Classifier, The classifiers are designed for a binary-class classification task, and specific parameters are set for each to assess their performance.

In [57]:
svc_clf = SVC()
svc_param = [{'C':[10,100],'kernel':['poly','rbf']
             }]
grid_search1 = GridSearchCV(svc_clf,svc_param,cv=3,scoring='accuracy')
grid_search1.fit(x_train_pd,y_train_pd)
Out[57]:
GridSearchCV(cv=3, estimator=SVC(),
             param_grid=[{'C': [10, 100], 'kernel': ['poly', 'rbf']}],
             scoring='accuracy')
In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.
GridSearchCV(cv=3, estimator=SVC(),
             param_grid=[{'C': [10, 100], 'kernel': ['poly', 'rbf']}],
             scoring='accuracy')
SVC()
SVC()
In [58]:
svc=grid_search1.best_estimator_
grid_search1.best_params_
Out[58]:
{'C': 100, 'kernel': 'rbf'}
In [59]:
svc_cvp= cross_val_predict(svc, x_train_pd, y_train_pd, cv=3)
plot_confusion_matrices(svc_cvp,y_train_pd)

generating confusion matrices to illustrate how well the model performs on the training data. By visualizing the distribution of predicted and actual labels, it provides insights into the model's accuracy and its ability to correctly classify different classes

In [60]:
plot_classification_report(svc_cvp, ['normal', 'anomaly'], y_train_pd)
In [61]:
plot_precision_recall_f1_per_class(svc_cvp,y_train_pd)

Showing the precision, reacall and F1 scores for the model on the test set for each class (normal,anomaly)

Decision Tree Classifier

This code snippet focuses on the instantiation and cross-validated predictions of Decision Tree Classifier, The classifiers are designed for a bin-class classification task, and specific parameters are set for each to assess their performance.

In [62]:
DTC_clf =  DecisionTreeClassifier(random_state=42)
DTC_param = [{'splitter':['best','random'],'max_depth':[2,3,5],
              'criterion':['gini','entropy']
             }]
grid_search2 = GridSearchCV(DTC_clf,DTC_param,cv=5,scoring='accuracy')
grid_search2.fit(x_train_pd,y_train_pd)
Out[62]:
GridSearchCV(cv=5, estimator=DecisionTreeClassifier(random_state=42),
             param_grid=[{'criterion': ['gini', 'entropy'],
                          'max_depth': [2, 3, 5],
                          'splitter': ['best', 'random']}],
             scoring='accuracy')
In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.
GridSearchCV(cv=5, estimator=DecisionTreeClassifier(random_state=42),
             param_grid=[{'criterion': ['gini', 'entropy'],
                          'max_depth': [2, 3, 5],
                          'splitter': ['best', 'random']}],
             scoring='accuracy')
DecisionTreeClassifier(random_state=42)
DecisionTreeClassifier(random_state=42)
In [63]:
DTC= grid_search2.best_estimator_
grid_search2.best_params_
Out[63]:
{'criterion': 'entropy', 'max_depth': 5, 'splitter': 'best'}
In [64]:
DTC_cvp= cross_val_predict(DTC, x_train_pd, y_train_pd, cv=3)
plot_confusion_matrices(DTC_cvp,y_train_pd)

generating confusion matrices to illustrate how well the model performs on the training data. By visualizing the distribution of predicted and actual labels, it provides insights into the model's accuracy and its ability to correctly classify different classes

In [65]:
plot_classification_report(DTC_cvp, ['normal', 'anomaly'], y_train_pd)
In [66]:
plot_precision_recall_f1_per_class(DTC_cvp,y_train_pd)

Showing the precision, reacall and F1 scores for the model on the test set for each class (normal,anomaly)

In [67]:
plt.figure(figsize=(15,11))
tree.plot_tree(DTC,filled=True,rounded=True)
plt.show()

Showing the tree structure for the decision tree classifier (best estimator)

Random Forest Classifier

This code snippet focuses on the instantiation and cross-validated predictions of Random Forest Classifier , The classifiers are designed for a bin-class classification task, and specific parameters are set for each to assess their performance.

In [68]:
rnd_clf = RandomForestClassifier(random_state=42)
rnd_param = [{'n_estimators':[50,100,200],'criterion':['gini', 'entropy','log_loss'],
              'max_depth':[4,5],'max_features':['sqrt','log2']
             }]
grid_search3 = GridSearchCV(rnd_clf,rnd_param,cv=3,scoring='accuracy',error_score='raise')
grid_search3.fit(x_train_pd,y_train_pd)
Out[68]:
GridSearchCV(cv=3, error_score='raise',
             estimator=RandomForestClassifier(random_state=42),
             param_grid=[{'criterion': ['gini', 'entropy', 'log_loss'],
                          'max_depth': [4, 5], 'max_features': ['sqrt', 'log2'],
                          'n_estimators': [50, 100, 200]}],
             scoring='accuracy')
In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.
GridSearchCV(cv=3, error_score='raise',
             estimator=RandomForestClassifier(random_state=42),
             param_grid=[{'criterion': ['gini', 'entropy', 'log_loss'],
                          'max_depth': [4, 5], 'max_features': ['sqrt', 'log2'],
                          'n_estimators': [50, 100, 200]}],
             scoring='accuracy')
RandomForestClassifier(random_state=42)
RandomForestClassifier(random_state=42)
In [69]:
rnd = grid_search3.best_estimator_
grid_search3.best_params_
Out[69]:
{'criterion': 'entropy',
 'max_depth': 5,
 'max_features': 'sqrt',
 'n_estimators': 200}
In [70]:
rnd_cvp= cross_val_predict(rnd, x_train_pd, y_train_pd, cv=3)
plot_confusion_matrices(rnd_cvp,y_train_pd)

generating confusion matrices to illustrate how well the model performs on the training data. By visualizing the distribution of predicted and actual labels, it provides insights into the model's accuracy and its ability to correctly classify different classes

In [71]:
plot_classification_report(rnd_cvp, ['normal', 'anomaly'], y_train_pd)
In [72]:
plot_precision_recall_f1_per_class(rnd_cvp,y_train_pd)
In [73]:
score = np.round(rnd.feature_importances_, 10)
importances = pd.DataFrame({'feature': x_train_pd.columns, 'importance': score})

# Sort the importances by importance value
importances = importances.sort_values('importance', ascending=False).set_index('feature')

# Set a color palette for the plot
colors = sns.color_palette("viridis", len(importances))

# Plot importances with a beautiful design
plt.figure(figsize=(12, 6))
sns.barplot(x=importances.index, y=importances['importance'], palette=colors)
plt.title('Feature Importances in Random Forest Classifier', fontsize=16)
plt.xlabel('Features', fontsize=14)
plt.ylabel('Importance', fontsize=14)
plt.xticks(rotation=45, ha='right', fontsize=12)
plt.yticks(fontsize=12)
plt.show()

Plot that shows the importance of features that affects the random forest decision in classification and showing that the most important featies is the the number of source bytes

Ensamble (Voting ,Stacking, Bagging Classifiers and Try some new classifiers )

This code snippet combines the winning algorithms from five previous experiments using ensemble techniques, specifically Voting Classifier and Stacking Classifier. The selected winning algorithms from the experiments are LogisticRegression, RandomForestClassifier, BaggingClassifier with an XGBClassifier as its base estimator, and SVC with an RBF kernel.

In [74]:
estimators=[
        ('log_reg',LogisticRegression(max_iter=400)),
        ('RF', RandomForestClassifier()),
        ('Bagging', BaggingClassifier(XGBClassifier())),
        ('SVC',  SVC(kernel="rbf", C=100))
    ]

voting_clf = VotingClassifier(estimators=estimators,voting='hard')
voting_cvp= cross_val_predict(voting_clf, x_train_pd, y_train_pd, cv=3)
In [75]:
stacking_clf = StackingClassifier(
    estimators=estimators,
    final_estimator=RandomForestClassifier(random_state=43),
    cv=5  # number of cross-validation folds
)
stacking_cvp= cross_val_predict(stacking_clf, x_train_pd, y_train_pd, cv=3)
In [76]:
bag3_clf = BaggingClassifier(GaussianProcessClassifier(), n_estimators=100,
                            max_samples=100, n_jobs=-1, random_state=42)

bag3_cvp= cross_val_predict(bag3_clf, x_train_pd, y_train_pd, cv=3)
In [77]:
HBC_clf = HistGradientBoostingClassifier(loss='log_loss',
    learning_rate=0.1,
    max_iter=100,
    max_leaf_nodes=31,
    max_depth=None,
    min_samples_leaf=50,
    early_stopping='auto',)

HBC_cvp= cross_val_predict(HBC_clf, x_train_pd, y_train_pd, cv=3)

This code defines and evaluates multiple VotingClassifier, HistGradientBoostingClassifier, and GaussianProcessClassifier and StackingClassifier,BaggingClassifier for a bin-class classification task. Each classifier is instantiated with distinct parameter settings. The generation of cross-validated predictions is a key step, enabling a thorough assessment and comparison of the models' performance on the provided training data. The choice of parameters is critical in determining how well each classifier can effectively handle the multi-class classification problem.

In [78]:
predicted_labels_list=[voting_cvp,stacking_cvp,bag3_cvp,HBC_cvp]
classifier_names=['VotingClassifier','StackingClassifier','BaggingClassifier','HistGradientBoostingClassifier']
plot_classification_reports_heatmaps(predicted_labels_list, classifier_names,y_train_pd)

This code generates heatmaps to visualize classification reports for a group of classifiers, facilitating a rapid comparison of their performance. The GaussianProcessClassifier model, configured with specific parameters, is recognized as the top performer, demonstrating superior accuracy, precision, recall, F1 score, and macro metrics. The analysis also reveals overfitting issues in the HistGradientBoostingClassifier and DecisionTreeClassifier, leading to their exclusion from further consideration. The parameters chosen for the GaussianProcessClassifier model significantly contribute to achieving the highest overall performance among the algorithms under consideration.

In [79]:
plot_confusion_matrices(voting_cvp,y_train_pd)

The function likely generates confusion matrices to illustrate how well the VotingClassifier model performs on the training data. By visualizing the distribution of predicted and actual labels, it provides insights into the model's accuracy and its ability to correctly classify different classes.

Comparison Between All classifiers and select the best one.

In [80]:
fig, [ax_roc, ax_det] = plt.subplots(1, 2, figsize=(11, 5))
classifiers = {
    "Linear Regression": log_reg ,
    "K Neighbor": knc,
    'Decision tree' : DTC,
    'Random Forest': rnd,
}

for name, clf in classifiers.items():
     clf.fit(x_train_pd, y_train_pd)
     RocCurveDisplay.from_estimator(clf, x_test_pd, y_test_pd, ax=ax_roc, name=name)
     DetCurveDisplay.from_estimator(clf, x_test_pd, y_test_pd, ax=ax_det, name=name)

ax_roc.set_title("Receiver Operating Characteristic (ROC) curves")
ax_det.set_title("Detection Error Tradeoff (DET) curves")

ax_roc.grid(linestyle="--")
ax_det.grid(linestyle="--")

plt.legend()
plt.show()

Evalute the performance of the classifiers and comparing them on the same plot using ROC curve and DET curve

In [81]:
selected_features = ["num__network_error_rate","cat__service"]

X = x_train_pd[selected_features]
y = y_train

# Encode the categorical labels to numerical values
label_encoder = LabelEncoder()
y_encoded = label_encoder.fit_transform(y)

# Create an instance of SVM and fit our data. We do not scale our data since we want to plot the support vectors
 # SVM regularization parameter
models = [
    log_reg,
    knc,
    DTC,
    rnd,
    stacking_clf,
    bag3_clf,
]

# title for the plots
titles = [
    "logistic regression",
    "KNeighborsClassifier",
    "DecisionTreeClassifier",
    "RandomForestClassifier",
    'stacking_clf',
    'bag3_clf'
]

# Create a separate plot for each classifier
fig, sub = plt.subplots(3, 2, figsize=(15, 20))
plt.subplots_adjust(wspace=0.4, hspace=0.4)

for clf, title, ax in zip(models, titles, sub.flatten()):
    clf.fit(X, y)
    disp = DecisionBoundaryDisplay.from_estimator(
        clf,
        X,
        response_method="predict",
        cmap=plt.cm.coolwarm,
        alpha=0.8,
        ax=ax,
        xlabel=selected_features[0],
        ylabel=selected_features[1],
    )

    # Scatter plot the points from the features selected and their corresponding encoded labels
    ax.scatter(X[selected_features[0]], X[selected_features[1]], c=y_encoded, cmap=plt.cm.coolwarm, s=20, edgecolors="k")
    
    ax.set_xticks(())
    ax.set_yticks(())
    ax.set_title(title)

plt.show()

Great ,That sounds like a well-considered decision! The RandomForestClassifier is a powerful ensemble method that combines the strengths of multiple base models to improve overall predictive performance.

Great ,That sounds like a well-considered decision! The Stacking Classifier is a powerful ensemble method that combines the strengths of multiple base models to improve overall predictive performance.

Fine-Tune RandomizedSearchCV

RandomizedSearchCV is employed to explore different combinations of hyperparameters.

In [82]:
param_dist = {
    'n_estimators': randint(50, 500),
    'criterion': ["gini", "entropy", "log_loss"],
    'min_samples_split': randint(2, 200),
    'min_samples_leaf': randint(1, 200),
    'bootstrap': [True, False],
    'max_features': ["sqrt", "log2"],
    'max_depth':[4,5],
    'max_features':['sqrt','log2'],
}

random_search = RandomizedSearchCV(
    rnd_clf,
    param_distributions=param_dist,
    n_iter=10,  # Adjust the number of iterations based on your resources
    cv=3,
    scoring='accuracy',
    n_jobs=-1
)

random_search.fit(x_train_pd, y_train_pd)
Out[82]:
RandomizedSearchCV(cv=3, estimator=RandomForestClassifier(random_state=42),
                   n_jobs=-1,
                   param_distributions={'bootstrap': [True, False],
                                        'criterion': ['gini', 'entropy',
                                                      'log_loss'],
                                        'max_depth': [4, 5],
                                        'max_features': ['sqrt', 'log2'],
                                        'min_samples_leaf': <scipy.stats._distn_infrastructure.rv_discrete_frozen object at 0x00000272E147E310>,
                                        'min_samples_split': <scipy.stats._distn_infrastructure.rv_discrete_frozen object at 0x00000272DE5671D0>,
                                        'n_estimators': <scipy.stats._distn_infrastructure.rv_discrete_frozen object at 0x00000272DB631BD0>},
                   scoring='accuracy')
In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.
RandomizedSearchCV(cv=3, estimator=RandomForestClassifier(random_state=42),
                   n_jobs=-1,
                   param_distributions={'bootstrap': [True, False],
                                        'criterion': ['gini', 'entropy',
                                                      'log_loss'],
                                        'max_depth': [4, 5],
                                        'max_features': ['sqrt', 'log2'],
                                        'min_samples_leaf': <scipy.stats._distn_infrastructure.rv_discrete_frozen object at 0x00000272E147E310>,
                                        'min_samples_split': <scipy.stats._distn_infrastructure.rv_discrete_frozen object at 0x00000272DE5671D0>,
                                        'n_estimators': <scipy.stats._distn_infrastructure.rv_discrete_frozen object at 0x00000272DB631BD0>},
                   scoring='accuracy')
RandomForestClassifier(random_state=42)
RandomForestClassifier(random_state=42)

The function likely generates confusion matrices to illustrate how well the Logistic Regression model performs on the training data. By visualizing the distribution of predicted and actual labels, it provides insights into the model's accuracy and its ability to correctly classify different classes.

In [83]:
random_search.best_params_
Out[83]:
{'bootstrap': True,
 'criterion': 'entropy',
 'max_depth': 4,
 'max_features': 'sqrt',
 'min_samples_leaf': 119,
 'min_samples_split': 147,
 'n_estimators': 249}
In [84]:
categorical_columns = data.select_dtypes(include=['number']).columns.tolist()
print(categorical_columns)
['duration', 'src_bytes', 'dst_bytes', 'land', 'wrong_fragment', 'urgent', 'hot', 'num_failed_logins', 'logged_in', 'num_compromised', 'root_shell', 'su_attempted', 'num_root', 'num_file_creations', 'num_shells', 'num_access_files', 'num_outbound_cmds', 'is_host_login', 'is_guest_login', 'count', 'srv_count', 'serror_rate', 'srv_serror_rate', 'rerror_rate', 'srv_rerror_rate', 'same_srv_rate', 'diff_srv_rate', 'srv_diff_host_rate', 'dst_host_count', 'dst_host_srv_count', 'dst_host_same_srv_rate', 'dst_host_diff_srv_rate', 'dst_host_same_src_port_rate', 'dst_host_srv_diff_host_rate', 'dst_host_serror_rate', 'dst_host_srv_serror_rate', 'dst_host_rerror_rate', 'dst_host_srv_rerror_rate', 'connection_duration', 'traffic_density', 'connection_stability', 'host_similarity', 'network_error_rate', 'connection_diversity', 'abnormal_behavior']

This code snippet performs a randomized search for hyperparameter tuning on a Stacking Classifier. The hyperparameters of the base models within the Stacking Classifier are optimized to enhance its overall performance. The search is conducted using RandomizedSearchCV, which randomly samples a specified number of hyperparameter combinations from the provided parameter distributions.

Final Model and Test

This process ensures a quantitative assessment of the final model's effectiveness on the test data, providing valuable insights into its real-world performance.

In [85]:
final_model= random_search.best_estimator_
In [86]:
final_model_predictions= cross_val_predict(final_model, x_test_pd, y_test_pd, cv=3)
plot_confusion_matrices(final_model_predictions,y_test_pd)

This process combines cross-validated predictions and confusion matrix visualization to offer a comprehensive understanding of the final model's behavior on the test data.

In [87]:
plot_classification_report(final_model_predictions, ['normal','anomaly'], y_test_pd)
In [88]:
plot_comparison(stacking_cvp, y_train_pd, final_model_predictions, y_test_pd, ['normal', 'anomaly'])

This visualization aims to highlight discrepancies in evaluation metrics between the training and test datasets, offering insights into the model's generalization capabilities.

🤖 Neural Network

Introduction to Deep Neural Network

In [89]:
Image(filename='DNN.png')
Out[89]:

How do neural networks work?

Think of each individual node as its own linear regression model, composed of input data, weights, a bias (or threshold), and an output. The formula would look something like this:

∑wixi + bias = w1x1 + w2x2 + w3x3 + bias

output = f(x) or Activation Function = 1 if ∑w1x1 + b>= 0; 0 if ∑w1x1 + b

Perceptron

This process a simple artificial neural network but we can import it from sklearn and use anything we used on the classification above and below it shows that percptron gives us 95% accuracy

In [90]:
prc_clf = Perceptron()
prc_param = [{'fit_intercept' : [True ,False],'early_stopping':[True,False],
              'max_iter':[100,500,1000]
             }]
grid_searchANN = GridSearchCV(prc_clf,prc_param,cv=3,scoring='accuracy')
grid_searchANN.fit(x_train_pd,y_train_pd)
Out[90]:
GridSearchCV(cv=3, estimator=Perceptron(),
             param_grid=[{'early_stopping': [True, False],
                          'fit_intercept': [True, False],
                          'max_iter': [100, 500, 1000]}],
             scoring='accuracy')
In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.
GridSearchCV(cv=3, estimator=Perceptron(),
             param_grid=[{'early_stopping': [True, False],
                          'fit_intercept': [True, False],
                          'max_iter': [100, 500, 1000]}],
             scoring='accuracy')
Perceptron()
Perceptron()
In [91]:
prc = grid_searchANN.best_estimator_
grid_searchANN.best_params_
Out[91]:
{'early_stopping': False, 'fit_intercept': True, 'max_iter': 100}
In [92]:
prc_cvp= cross_val_predict(prc, x_train_pd, y_train_pd, cv=3)
plot_confusion_matrices(prc_cvp,y_train_pd)
In [93]:
plot_classification_report(prc_cvp, ['normal', 'anomaly'], y_train_pd)
In [94]:
plot_precision_recall_f1_per_class(prc_cvp,y_train_pd)

Multilayer Perceptron (MLP)

First using the MLP in the sklearn then we are going to build it using keras

In [95]:
from sklearn.neural_network import MLPClassifier
mlp_clf = MLPClassifier(early_stopping=True,momentum=0.9)
mlp_param = [{'hidden_layer_sizes': [1,3,5] 
             }]
grid_searchANN2 = GridSearchCV(mlp_clf,mlp_param,cv=3,scoring='accuracy')
grid_searchANN2.fit(x_train_pd,y_train_pd)
Out[95]:
GridSearchCV(cv=3, estimator=MLPClassifier(early_stopping=True),
             param_grid=[{'hidden_layer_sizes': [1, 3, 5]}],
             scoring='accuracy')
In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.
GridSearchCV(cv=3, estimator=MLPClassifier(early_stopping=True),
             param_grid=[{'hidden_layer_sizes': [1, 3, 5]}],
             scoring='accuracy')
MLPClassifier(early_stopping=True)
MLPClassifier(early_stopping=True)
In [96]:
MLP = grid_searchANN2.best_estimator_
grid_searchANN2.best_params_
Out[96]:
{'hidden_layer_sizes': 5}
In [97]:
MLP_cvp= cross_val_predict(MLP, x_train_pd, y_train_pd, cv=3)
plot_confusion_matrices(MLP_cvp,y_train_pd)
In [98]:
plot_classification_report(MLP_cvp, ['normal', 'anomaly'], y_train_pd)
In [99]:
plot_precision_recall_f1_per_class(MLP_cvp,y_train_pd)

Sequential API

In [100]:
#Clear global state in memory to avoid clutter from old models.
tf.keras.backend.clear_session()
WARNING:tensorflow:From C:\Users\abeda\anaconda3\Lib\site-packages\keras\src\backend.py:277: The name tf.reset_default_graph is deprecated. Please use tf.compat.v1.reset_default_graph instead.

Building the Model

In [101]:
model = tf.keras.Sequential()
model.add(tf.keras.layers.Dense(64, activation="relu", 
                                kernel_initializer="he_normal", kernel_regularizer=tf.keras.regularizers.l2(0.01),
                                name='input', input_shape=(50,)))

model.add(keras.layers.Dropout(rate=0.2))

model.add(tf.keras.layers.Dense(30, activation="relu", kernel_initializer="he_normal",kernel_regularizer=tf.keras.regularizers.l2(0.01),
                                name='hidden1'))

model.add(keras.layers.Dropout(rate=0.2))

model.add(tf.keras.layers.Dense(20, activation="relu", kernel_initializer='he_normal',kernel_regularizer=tf.keras.regularizers.l2(0.01),
                                name='hidden2'))

model.add(keras.layers.Dropout(rate=0.2))

model.add(tf.keras.layers.Dense(15, activation="relu", kernel_initializer='he_normal',kernel_regularizer=tf.keras.regularizers.l2(0.01),
                                name='hidden3'))

model.add(keras.layers.Dropout(rate=0.2))

model.add(tf.keras.layers.Dense(5, activation="relu", kernel_initializer="he_normal",kernel_regularizer=tf.keras.regularizers.l2(0.01),
                                name='hidden4'))

he_avg_init = tf.keras.initializers.VarianceScaling(scale=2., mode="fan_avg",
                                                    distribution="uniform")

model.add(tf.keras.layers.Dense(1, activation="sigmoid", kernel_initializer=he_avg_init ,kernel_regularizer=tf.keras.regularizers.l2(0.01),
                                name='output'))
model.summary()
Model: "sequential"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
=================================================================
 input (Dense)               (None, 64)                3264      
                                                                 
 dropout (Dropout)           (None, 64)                0         
                                                                 
 hidden1 (Dense)             (None, 30)                1950      
                                                                 
 dropout_1 (Dropout)         (None, 30)                0         
                                                                 
 hidden2 (Dense)             (None, 20)                620       
                                                                 
 dropout_2 (Dropout)         (None, 20)                0         
                                                                 
 hidden3 (Dense)             (None, 15)                315       
                                                                 
 dropout_3 (Dropout)         (None, 15)                0         
                                                                 
 hidden4 (Dense)             (None, 5)                 80        
                                                                 
 output (Dense)              (None, 1)                 6         
                                                                 
=================================================================
Total params: 6235 (24.36 KB)
Trainable params: 6235 (24.36 KB)
Non-trainable params: 0 (0.00 Byte)
_________________________________________________________________
In [102]:
def visualize_model(model):
    fig, ax = plt.subplots(figsize=(12, 8))
    ax.axis('off')

    # Layer sizes
    layer_sizes = [layer.get_config()['units'] for layer in model.layers if isinstance(layer, keras.layers.Dense)]

    # Draw nodes
    for i, size in enumerate(layer_sizes):
        ax.scatter(np.repeat(i + 1, size), range(size), color='blue', s=300, zorder=2)

    # Draw edges
    for i in range(len(layer_sizes) - 1):
        ax.plot([i + 1, i + 2], [0.5 * layer_sizes[i], 0.5 * layer_sizes[i + 1]], color='gray', linewidth=2, zorder=1)

    # Add layer names
    for i, size in enumerate(layer_sizes):
        ax.text(i + 1, size + 2, model.layers[i].name, fontsize=12, ha='center')

    plt.title("Model Architecture", fontsize=16, fontweight='bold')
    plt.show()

# Display the creative model plot
visualize_model(model)
In [103]:
keras.utils.plot_model(model, "multi_input_and_output_model.png", show_shapes=True)
Out[103]:

The model is a feedforward neural network with one input layer, four hidden layers, and one output layer. The input layer has 64 units, and each hidden layer progressively reduces the number of units, ending with the output layer having 1 unit. The activation function used in the hidden layers is not specified, but it's common to use ReLU (Rectified Linear Unit) for such layers. The output layer uses a sigmoid activation, indicating a binary classification task.

Compiling the Model

In [104]:
optimizer = tf.keras.optimizers.Adam()
model.compile(loss="binary_crossentropy",optimizer=optimizer,metrics=["accuracy"])

Fitting the Model

In [105]:
# history = model.fit(x_train_pd, y_train_pd, epochs=100,validation_split=0.2,batch_size=32)
early_stopping_cb = tf.keras.callbacks.EarlyStopping(patience=10,restore_best_weights=True)


history = model.fit(x_train_pd, y_train_pd, epochs=100,validation_split=0.2 ,callbacks=[early_stopping_cb])
Epoch 1/100
WARNING:tensorflow:From C:\Users\abeda\anaconda3\Lib\site-packages\keras\src\utils\tf_utils.py:492: The name tf.ragged.RaggedTensorValue is deprecated. Please use tf.compat.v1.ragged.RaggedTensorValue instead.

WARNING:tensorflow:From C:\Users\abeda\anaconda3\Lib\site-packages\keras\src\engine\base_layer_utils.py:384: The name tf.executing_eagerly_outside_functions is deprecated. Please use tf.compat.v1.executing_eagerly_outside_functions instead.

536/536 [==============================] - 6s 6ms/step - loss: 1.9541 - accuracy: 0.6854 - val_loss: 1.0611 - val_accuracy: 0.9015
Epoch 2/100
536/536 [==============================] - 3s 5ms/step - loss: 0.8492 - accuracy: 0.8905 - val_loss: 0.6436 - val_accuracy: 0.9143
Epoch 3/100
536/536 [==============================] - 3s 5ms/step - loss: 0.5584 - accuracy: 0.9099 - val_loss: 0.4493 - val_accuracy: 0.9300
Epoch 4/100
536/536 [==============================] - 3s 5ms/step - loss: 0.4061 - accuracy: 0.9332 - val_loss: 0.3206 - val_accuracy: 0.9612
Epoch 5/100
536/536 [==============================] - 3s 5ms/step - loss: 0.3064 - accuracy: 0.9588 - val_loss: 0.2749 - val_accuracy: 0.9591
Epoch 6/100
536/536 [==============================] - 3s 5ms/step - loss: 0.2680 - accuracy: 0.9601 - val_loss: 0.2451 - val_accuracy: 0.9594
Epoch 7/100
536/536 [==============================] - 3s 5ms/step - loss: 0.2499 - accuracy: 0.9611 - val_loss: 0.2310 - val_accuracy: 0.9631
Epoch 8/100
536/536 [==============================] - 3s 5ms/step - loss: 0.2366 - accuracy: 0.9618 - val_loss: 0.2384 - val_accuracy: 0.9554
Epoch 9/100
536/536 [==============================] - 3s 5ms/step - loss: 0.2396 - accuracy: 0.9591 - val_loss: 0.2195 - val_accuracy: 0.9689
Epoch 10/100
536/536 [==============================] - 3s 5ms/step - loss: 0.2293 - accuracy: 0.9613 - val_loss: 0.2256 - val_accuracy: 0.9673
Epoch 11/100
536/536 [==============================] - 3s 5ms/step - loss: 0.2266 - accuracy: 0.9625 - val_loss: 0.2156 - val_accuracy: 0.9612
Epoch 12/100
536/536 [==============================] - 3s 5ms/step - loss: 0.2235 - accuracy: 0.9622 - val_loss: 0.2035 - val_accuracy: 0.9652
Epoch 13/100
536/536 [==============================] - 3s 5ms/step - loss: 0.2190 - accuracy: 0.9621 - val_loss: 0.2038 - val_accuracy: 0.9661
Epoch 14/100
536/536 [==============================] - 3s 5ms/step - loss: 0.2132 - accuracy: 0.9646 - val_loss: 0.2067 - val_accuracy: 0.9671
Epoch 15/100
536/536 [==============================] - 3s 5ms/step - loss: 0.2138 - accuracy: 0.9654 - val_loss: 0.1970 - val_accuracy: 0.9678
Epoch 16/100
536/536 [==============================] - 3s 5ms/step - loss: 0.2121 - accuracy: 0.9658 - val_loss: 0.1947 - val_accuracy: 0.9706
Epoch 17/100
536/536 [==============================] - 3s 5ms/step - loss: 0.2129 - accuracy: 0.9665 - val_loss: 0.1956 - val_accuracy: 0.9694
Epoch 18/100
536/536 [==============================] - 4s 7ms/step - loss: 0.2144 - accuracy: 0.9644 - val_loss: 0.1958 - val_accuracy: 0.9731
Epoch 19/100
536/536 [==============================] - 3s 5ms/step - loss: 0.2092 - accuracy: 0.9684 - val_loss: 0.1956 - val_accuracy: 0.9666
Epoch 20/100
536/536 [==============================] - 3s 6ms/step - loss: 0.2092 - accuracy: 0.9673 - val_loss: 0.1990 - val_accuracy: 0.9701
Epoch 21/100
536/536 [==============================] - 3s 5ms/step - loss: 0.2053 - accuracy: 0.9699 - val_loss: 0.1951 - val_accuracy: 0.9685
Epoch 22/100
536/536 [==============================] - 3s 5ms/step - loss: 0.2101 - accuracy: 0.9670 - val_loss: 0.1997 - val_accuracy: 0.9736
Epoch 23/100
536/536 [==============================] - 3s 5ms/step - loss: 0.2103 - accuracy: 0.9671 - val_loss: 0.1948 - val_accuracy: 0.9727
Epoch 24/100
536/536 [==============================] - 3s 5ms/step - loss: 0.2084 - accuracy: 0.9678 - val_loss: 0.1930 - val_accuracy: 0.9694
Epoch 25/100
536/536 [==============================] - 3s 5ms/step - loss: 0.2108 - accuracy: 0.9671 - val_loss: 0.2051 - val_accuracy: 0.9631
Epoch 26/100
536/536 [==============================] - 3s 5ms/step - loss: 0.2085 - accuracy: 0.9668 - val_loss: 0.1959 - val_accuracy: 0.9760
Epoch 27/100
536/536 [==============================] - 3s 5ms/step - loss: 0.2080 - accuracy: 0.9669 - val_loss: 0.1954 - val_accuracy: 0.9664
Epoch 28/100
536/536 [==============================] - 3s 5ms/step - loss: 0.2107 - accuracy: 0.9671 - val_loss: 0.1949 - val_accuracy: 0.9741
Epoch 29/100
536/536 [==============================] - 3s 5ms/step - loss: 0.2094 - accuracy: 0.9666 - val_loss: 0.1893 - val_accuracy: 0.9739
Epoch 30/100
536/536 [==============================] - 3s 5ms/step - loss: 0.2094 - accuracy: 0.9667 - val_loss: 0.2035 - val_accuracy: 0.9657
Epoch 31/100
536/536 [==============================] - 3s 5ms/step - loss: 0.2109 - accuracy: 0.9657 - val_loss: 0.2026 - val_accuracy: 0.9668
Epoch 32/100
536/536 [==============================] - 3s 5ms/step - loss: 0.2099 - accuracy: 0.9671 - val_loss: 0.1914 - val_accuracy: 0.9720
Epoch 33/100
536/536 [==============================] - 3s 5ms/step - loss: 0.2092 - accuracy: 0.9664 - val_loss: 0.1941 - val_accuracy: 0.9696
Epoch 34/100
536/536 [==============================] - 3s 6ms/step - loss: 0.2107 - accuracy: 0.9657 - val_loss: 0.1996 - val_accuracy: 0.9701
Epoch 35/100
536/536 [==============================] - 3s 6ms/step - loss: 0.2132 - accuracy: 0.9654 - val_loss: 0.1933 - val_accuracy: 0.9720
Epoch 36/100
536/536 [==============================] - 3s 5ms/step - loss: 0.2067 - accuracy: 0.9687 - val_loss: 0.1958 - val_accuracy: 0.9720
Epoch 37/100
536/536 [==============================] - 3s 5ms/step - loss: 0.2086 - accuracy: 0.9674 - val_loss: 0.1915 - val_accuracy: 0.9767
Epoch 38/100
536/536 [==============================] - 3s 5ms/step - loss: 0.2086 - accuracy: 0.9688 - val_loss: 0.1978 - val_accuracy: 0.9750
Epoch 39/100
536/536 [==============================] - 3s 5ms/step - loss: 0.2097 - accuracy: 0.9682 - val_loss: 0.1978 - val_accuracy: 0.9743
In [106]:
history.params
Out[106]:
{'verbose': 1, 'epochs': 100, 'steps': 536}
In [107]:
import plotly.graph_objects as go

# Assuming you have history containing accuracy and loss information
acc_train = history.history['accuracy']
acc_val = history.history['val_accuracy']
loss_train = history.history['loss']
loss_val = history.history['val_loss']

# Create a subplot with two plots
fig = go.Figure()

# Plot Training and Validation Accuracy
fig.add_trace(go.Scatter(x=list(range(1, len(acc_train) + 1)), y=acc_train, mode='lines+markers', name='Training Accuracy'))
fig.add_trace(go.Scatter(x=list(range(1, len(acc_val) + 1)), y=acc_val, mode='lines+markers', name='Validation Accuracy'))

# Set layout for the first subplot
fig.update_layout(
    title='Training and Validation Accuracy',
    xaxis=dict(title='Epochs'),
    yaxis=dict(title='Accuracy'),
)

# Add a new subplot for Loss
fig.add_trace(go.Scatter(x=list(range(1, len(loss_train) + 1)), y=loss_train, mode='lines+markers', name='Training Loss'))
fig.add_trace(go.Scatter(x=list(range(1, len(loss_val) + 1)), y=loss_val, mode='lines+markers', name='Validation Loss'))

# Set layout for the second subplot
fig.update_layout(
    title='Training and Validation Loss',
    xaxis2=dict(title='Epochs'),
    yaxis2=dict(title='Loss'),
)

# Update the layout to have subplots
fig.update_layout(
    updatemenus=[
        dict(
            x=0.5,
            y=1.15,
            xanchor='center',
            yanchor='top',
            buttons=list([
                dict(label='Accuracy',
                     method='relayout',
                     args=['yaxis', dict(title='Accuracy')]),
                dict(label='Loss',
                     method='relayout',
                     args=['yaxis', dict(title='Loss')]),
            ]),
        ),
    ],
)

# Show the plot
fig.show()

🛡️ Here are some observations 🕵️‍♂️

Training Accuracy:

The training accuracy is increasing, reaching a value of around 0.9816 in the last epoch. This indicates that model is learning from the training data and improving its performance.

Validation Accuracy:

The validation accuracy is also high, reaching approximately 0.9813 in the last epoch. This suggests that your model is generalizing well to unseen data, which is a positive sign.

Loss Values:

The training and validation loss values are decreasing, which is expected during the training process. Lower loss values indicate better convergence.

Consistency:

The training and validation metrics seem to be consistent, which is a good sign of a well-trained model.

Potential Overfitting:

The training accuracy is higher than the validation accuracy, but the difference is not significant. However, it's worth monitoring for potential overfitting, especially if there is a larger gap between training and validation accuracy in later epochs.

model appears to be training well

</div>

Sequential trained Confusion Matrix

In [108]:
y_proba = model.predict(x_train_pd)
# Create a confusion matrix
cm = confusion_matrix(y_train_pd, (y_proba>0.5 ).astype(int))

# Define class names and colors
class_names = ['Normal', 'Anomaly']
colorscale = [[0, '#66c2ff'], [1, '#ff9999']]

# Create a heatmap using Plotly
fig = ff.create_annotated_heatmap(
    z=cm,
    x=class_names,
    y=class_names,
    colorscale=colorscale,
    showscale=True
)

# Update layout for better visualization
fig.update_layout(
    title='Confusion Matrix',
    title_font_size=20,
    xaxis=dict(title='Predicted Label', side='bottom'),
    yaxis=dict(title='True Label'),
)

# Show the creative confusion matrix
fig.show()
670/670 [==============================] - 2s 2ms/step
In [109]:
cm = (y_proba > 0.5).astype(int)

# Calculate metrics
seq_a = accuracy_score(y_train_pd, cm)
seq_p = precision_score(y_train_pd, cm)
seq_r = recall_score(y_train_pd, cm)
seq_f = f1_score(y_train_pd, cm)

# Visualize the metrics
labels = ['Accuracy', 'Precision', 'Recall', 'F1 Score']
values = [seq_a, seq_p, seq_r, seq_f]

# Plotting
plt.figure(figsize=(10, 6))
sns.barplot(x=values, y=labels, palette='viridis')
plt.title('Model Evaluation Metrics')
plt.xlabel('Metric Value')
plt.ylabel('Metrics')
plt.show()

Evaluating the Model

In [110]:
test_loss_percentage,test_accuracy_percentage=model.evaluate(x_test_pd, y_test_pd)
test_accuracy_percentage=test_accuracy_percentage*100
test_loss_percentage=test_loss_percentage*100
# Create a Radar Chart
fig = go.Figure()

# Add a radar layer for Test Accuracy
fig.add_trace(go.Scatterpolar(
    r=[test_accuracy_percentage],
    theta=['Test Accuracy'],
    fill='toself',
    name='Test Accuracy',
    line=dict(color='#66c2ff')
))

# Add a radar layer for Test Loss
fig.add_trace(go.Scatterpolar(
    r=[test_loss_percentage],
    theta=['Test Loss'],
    fill='toself',
    name='Test Loss',
    line=dict(color='#ff9999')
))

# Set layout parameters for the radar chart
fig.update_layout(
    polar=dict(
        radialaxis=dict(visible=True, range=[0, 100]),
    ),
    showlegend=True,
    title='Model Performance on Test Data',
    title_font_size=20
)

# Display the Creative Radar Chart
fig.show()
119/119 [==============================] - 1s 3ms/step - loss: 0.1837 - accuracy: 0.9749
In [111]:
# Assuming you have true labels (y_test_pd) and predicted labels
# Replace predicted_labels with the actual predictions from your model
predicted_labels = model.predict((x_test_pd))

# Convert predicted probabilities to class labels (assuming binary classification)
predicted_classes = (predicted_labels > 0.5).astype(int)

# Create confusion matrix
cm = confusion_matrix(y_test_pd, predicted_classes)

# Display the confusion matrix using seaborn
plt.figure(figsize=(8, 6))
sns.heatmap(cm, annot=True, fmt="d", cmap="Blues", xticklabels=['normal', 'anomaly'], yticklabels=['normal', 'anomaly'])
plt.xlabel("Predicted")
plt.ylabel("True")
plt.title("Confusion Matrix")
plt.show()
119/119 [==============================] - 0s 3ms/step
In [112]:
model.save('my_model.keras')
In [113]:
model=keras.models.load_model('my_model.keras')

Functional API

In [129]:
#Clear global state in memory to avoid clutter from old models.
tf.keras.backend.clear_session()

Building the Model

In [130]:
Image(filename='FunctionalApi.png')
Out[130]:
In [136]:
# Define three input layers with different shapes
input_A = tf.keras.layers.Input(shape=[25] , name='input_A')
input_B = tf.keras.layers.Input(shape=[15] ,name='input_B')
input_C = tf.keras.layers.Input(shape=[10] ,name='input_C')

leaky_relu = tf.keras.layers.LeakyReLU(alpha=0.2)

# Build the first hidden layers for input_A and input_B
hidden1_0 = tf.keras.layers.Dense(15, activation=leaky_relu, kernel_initializer="he_normal",kernel_regularizer=tf.keras.regularizers.l2(0.01))(input_B)
hidden1_1 = tf.keras.layers.Dense(25, activation="relu", kernel_initializer="he_normal",kernel_regularizer=tf.keras.regularizers.l2(0.01))(input_A)

# Concatenate the first hidden layers with input_C
concat1 = tf.keras.layers.concatenate([hidden1_0, input_C])

# Build the second hidden layers for the concatenated output
hidden2_0 = tf.keras.layers.Dense(45, activation=leaky_relu, kernel_initializer="he_normal",kernel_regularizer=tf.keras.regularizers.l2(0.01))(concat1)
hidden2_1 = tf.keras.layers.Dense(40, activation="relu", kernel_initializer="he_normal",kernel_regularizer=tf.keras.regularizers.l2(0.01))(hidden1_1)

# Concatenate the second hidden layers
concat2 = tf.keras.layers.concatenate([hidden2_0, hidden2_1])


he_avg_init = tf.keras.initializers.VarianceScaling(scale=2., mode="fan_avg",
                                                    distribution="uniform")



# Output layer with sigmoid activation for binary classification
output = tf.keras.layers.Dense(1, activation="sigmoid",kernel_initializer=he_avg_init, name='main_output',kernel_regularizer=tf.keras.regularizers.l2(0.01))(concat2)

# Auxiliary output for regularization or monitoring
aux_output = tf.keras.layers.Dense(1, activation="sigmoid", kernel_initializer=he_avg_init, name='aux_output',kernel_regularizer=tf.keras.regularizers.l2(0.01))(hidden2_0)

# Create a Keras Model with multiple inputs and outputs
model_functional_API = tf.keras.Model(inputs=[input_A, input_B, input_C], outputs=[output, aux_output])

this code defines a neural network model with three different input branches (input_A, input_B, and input_C). The model consists of multiple hidden layers, and the final output is a binary classification using sigmoid activation. Additionally, there is an auxiliary output (aux_output) branching from one of the hidden layers, which can be used for regularization or monitoring during training. The model is created using the Keras Functional API.

In [137]:
model_functional_API.summary()
Model: "model_1"
__________________________________________________________________________________________________
 Layer (type)                Output Shape                 Param #   Connected to                  
==================================================================================================
 input_B (InputLayer)        [(None, 15)]                 0         []                            
                                                                                                  
 dense_4 (Dense)             (None, 15)                   240       ['input_B[0][0]']             
                                                                                                  
 input_C (InputLayer)        [(None, 10)]                 0         []                            
                                                                                                  
 input_A (InputLayer)        [(None, 25)]                 0         []                            
                                                                                                  
 concatenate_2 (Concatenate  (None, 25)                   0         ['dense_4[0][0]',             
 )                                                                   'input_C[0][0]']             
                                                                                                  
 dense_5 (Dense)             (None, 25)                   650       ['input_A[0][0]']             
                                                                                                  
 dense_6 (Dense)             (None, 45)                   1170      ['concatenate_2[0][0]']       
                                                                                                  
 dense_7 (Dense)             (None, 40)                   1040      ['dense_5[0][0]']             
                                                                                                  
 concatenate_3 (Concatenate  (None, 85)                   0         ['dense_6[0][0]',             
 )                                                                   'dense_7[0][0]']             
                                                                                                  
 main_output (Dense)         (None, 1)                    86        ['concatenate_3[0][0]']       
                                                                                                  
 aux_output (Dense)          (None, 1)                    46        ['dense_6[0][0]']             
                                                                                                  
==================================================================================================
Total params: 3232 (12.62 KB)
Trainable params: 3232 (12.62 KB)
Non-trainable params: 0 (0.00 Byte)
__________________________________________________________________________________________________
In [138]:
from tensorflow.keras.utils import plot_model

# Assuming the code for creating the model is provided above

# Save the visualization to a file
plot_model(model_functional_API, to_file='model_plot.png', show_shapes=True, show_layer_names=True)
Out[138]:

🛡️ Architecture: 🕵️

Input Layers:

Three input layers with different shapes are defined: input_A (shape=[25]), input_B (shape=[15]), and input_C (shape=[10]).

Hidden Layers:

Two sets of hidden layers are created: hidden1_0 and hidden1_1 process input_A and input_B, respectively. These outputs are then concatenated with input_C to form concat1. The second hidden layer, concat2, is formed by processing concat1 through hidden2_0 and hidden2_1.

Output Layers:

The final output layer, named 'main_output,' uses a sigmoid activation for binary classification. An auxiliary output, 'aux_output,' is also created from the second hidden layer (hidden2_0) for regularization or monitoring purposes.

Model:

The Keras Model, named 'model_functional_API,' takes three inputs (input_A, input_B, and input_C) and produces two outputs (main_output and aux_output).

</div>

Compiling the Model

In [146]:
optimizer = tf.keras.optimizers.Adam(learning_rate=1e-3, clipvalue=1.0, ema_momentum=0.95)

model_functional_API.compile(
    loss=("binary_crossentropy", "binary_crossentropy"),
    loss_weights=(0.9, 0.1),
    optimizer=optimizer,
    metrics=["accuracy", "accuracy"]  # Use consistent metric names
)

Fitting the Model

In [147]:
# Corrected slicing for inputA_train, inputB_train, and inputC_train
inputA_train, inputB_train, inputC_train = x_train_pd.iloc[:, :25], x_train_pd.iloc[:, 25:40], x_train_pd.iloc[:, 40:]

# Corrected slicing for inputA_test, inputB_test, and inputC_test
inputA_test, inputB_test, inputC_test = x_test_pd[:, :25], x_test_pd[:, 25:40], x_test_pd[:, 40:]


my_callbacks = [
    tf.keras.callbacks.EarlyStopping(patience=10, restore_best_weights=True)]

history_functional_API = model_functional_API.fit([inputA_train, inputB_train,inputC_train],
                                                  [y_train_pd,y_train_pd], 
                                                  epochs=100,
                                                  validation_split=0.2,callbacks=my_callbacks)
Epoch 1/100
536/536 [==============================] - 5s 5ms/step - loss: 1.4408 - main_output_loss: 0.1977 - aux_output_loss: 0.2464 - main_output_accuracy: 0.9302 - aux_output_accuracy: 0.9036 - val_loss: 0.6464 - val_main_output_loss: 0.1263 - val_aux_output_loss: 0.1670 - val_main_output_accuracy: 0.9608 - val_aux_output_accuracy: 0.9402
Epoch 2/100
536/536 [==============================] - 3s 5ms/step - loss: 0.4509 - main_output_loss: 0.1300 - aux_output_loss: 0.1755 - main_output_accuracy: 0.9585 - aux_output_accuracy: 0.9371 - val_loss: 0.3294 - val_main_output_loss: 0.1187 - val_aux_output_loss: 0.1673 - val_main_output_accuracy: 0.9601 - val_aux_output_accuracy: 0.9458
Epoch 3/100
536/536 [==============================] - 3s 5ms/step - loss: 0.2985 - main_output_loss: 0.1255 - aux_output_loss: 0.1783 - main_output_accuracy: 0.9594 - aux_output_accuracy: 0.9411 - val_loss: 0.2649 - val_main_output_loss: 0.1174 - val_aux_output_loss: 0.1765 - val_main_output_accuracy: 0.9598 - val_aux_output_accuracy: 0.9430
Epoch 4/100
536/536 [==============================] - 2s 5ms/step - loss: 0.2629 - main_output_loss: 0.1256 - aux_output_loss: 0.1851 - main_output_accuracy: 0.9588 - aux_output_accuracy: 0.9388 - val_loss: 0.2507 - val_main_output_loss: 0.1194 - val_aux_output_loss: 0.1814 - val_main_output_accuracy: 0.9610 - val_aux_output_accuracy: 0.9339
Epoch 5/100
536/536 [==============================] - 4s 7ms/step - loss: 0.2510 - main_output_loss: 0.1259 - aux_output_loss: 0.1881 - main_output_accuracy: 0.9594 - aux_output_accuracy: 0.9382 - val_loss: 0.2432 - val_main_output_loss: 0.1236 - val_aux_output_loss: 0.1907 - val_main_output_accuracy: 0.9608 - val_aux_output_accuracy: 0.9360
Epoch 6/100
536/536 [==============================] - 3s 6ms/step - loss: 0.2426 - main_output_loss: 0.1255 - aux_output_loss: 0.1904 - main_output_accuracy: 0.9598 - aux_output_accuracy: 0.9361 - val_loss: 0.2326 - val_main_output_loss: 0.1175 - val_aux_output_loss: 0.1849 - val_main_output_accuracy: 0.9605 - val_aux_output_accuracy: 0.9435
Epoch 7/100
536/536 [==============================] - 3s 6ms/step - loss: 0.2358 - main_output_loss: 0.1237 - aux_output_loss: 0.1894 - main_output_accuracy: 0.9604 - aux_output_accuracy: 0.9379 - val_loss: 0.2279 - val_main_output_loss: 0.1172 - val_aux_output_loss: 0.1845 - val_main_output_accuracy: 0.9610 - val_aux_output_accuracy: 0.9330
Epoch 8/100
536/536 [==============================] - 3s 6ms/step - loss: 0.2328 - main_output_loss: 0.1245 - aux_output_loss: 0.1915 - main_output_accuracy: 0.9604 - aux_output_accuracy: 0.9357 - val_loss: 0.2259 - val_main_output_loss: 0.1204 - val_aux_output_loss: 0.1928 - val_main_output_accuracy: 0.9584 - val_aux_output_accuracy: 0.9374
Epoch 9/100
536/536 [==============================] - 3s 5ms/step - loss: 0.2298 - main_output_loss: 0.1243 - aux_output_loss: 0.1923 - main_output_accuracy: 0.9600 - aux_output_accuracy: 0.9361 - val_loss: 0.2225 - val_main_output_loss: 0.1185 - val_aux_output_loss: 0.1875 - val_main_output_accuracy: 0.9570 - val_aux_output_accuracy: 0.9311
Epoch 10/100
536/536 [==============================] - 3s 5ms/step - loss: 0.2279 - main_output_loss: 0.1246 - aux_output_loss: 0.1930 - main_output_accuracy: 0.9595 - aux_output_accuracy: 0.9356 - val_loss: 0.2258 - val_main_output_loss: 0.1218 - val_aux_output_loss: 0.1878 - val_main_output_accuracy: 0.9636 - val_aux_output_accuracy: 0.9421
Epoch 11/100
536/536 [==============================] - 3s 5ms/step - loss: 0.2259 - main_output_loss: 0.1237 - aux_output_loss: 0.1922 - main_output_accuracy: 0.9605 - aux_output_accuracy: 0.9360 - val_loss: 0.2417 - val_main_output_loss: 0.1420 - val_aux_output_loss: 0.2067 - val_main_output_accuracy: 0.9475 - val_aux_output_accuracy: 0.9276
Epoch 12/100
536/536 [==============================] - 2s 4ms/step - loss: 0.2254 - main_output_loss: 0.1255 - aux_output_loss: 0.1938 - main_output_accuracy: 0.9591 - aux_output_accuracy: 0.9349 - val_loss: 0.2167 - val_main_output_loss: 0.1160 - val_aux_output_loss: 0.1813 - val_main_output_accuracy: 0.9612 - val_aux_output_accuracy: 0.9367
Epoch 13/100
536/536 [==============================] - 2s 4ms/step - loss: 0.2237 - main_output_loss: 0.1249 - aux_output_loss: 0.1918 - main_output_accuracy: 0.9595 - aux_output_accuracy: 0.9378 - val_loss: 0.2226 - val_main_output_loss: 0.1239 - val_aux_output_loss: 0.1972 - val_main_output_accuracy: 0.9573 - val_aux_output_accuracy: 0.9297
Epoch 14/100
536/536 [==============================] - 3s 5ms/step - loss: 0.2230 - main_output_loss: 0.1242 - aux_output_loss: 0.1942 - main_output_accuracy: 0.9590 - aux_output_accuracy: 0.9338 - val_loss: 0.2155 - val_main_output_loss: 0.1158 - val_aux_output_loss: 0.1823 - val_main_output_accuracy: 0.9652 - val_aux_output_accuracy: 0.9444
Epoch 15/100
536/536 [==============================] - 2s 4ms/step - loss: 0.2216 - main_output_loss: 0.1247 - aux_output_loss: 0.1914 - main_output_accuracy: 0.9590 - aux_output_accuracy: 0.9371 - val_loss: 0.2251 - val_main_output_loss: 0.1276 - val_aux_output_loss: 0.1989 - val_main_output_accuracy: 0.9556 - val_aux_output_accuracy: 0.9286
Epoch 16/100
536/536 [==============================] - 2s 4ms/step - loss: 0.2211 - main_output_loss: 0.1248 - aux_output_loss: 0.1926 - main_output_accuracy: 0.9585 - aux_output_accuracy: 0.9369 - val_loss: 0.2164 - val_main_output_loss: 0.1205 - val_aux_output_loss: 0.1920 - val_main_output_accuracy: 0.9643 - val_aux_output_accuracy: 0.9412
Epoch 17/100
536/536 [==============================] - 2s 4ms/step - loss: 0.2201 - main_output_loss: 0.1242 - aux_output_loss: 0.1931 - main_output_accuracy: 0.9595 - aux_output_accuracy: 0.9351 - val_loss: 0.2188 - val_main_output_loss: 0.1225 - val_aux_output_loss: 0.1903 - val_main_output_accuracy: 0.9626 - val_aux_output_accuracy: 0.9409
Epoch 18/100
536/536 [==============================] - 2s 4ms/step - loss: 0.2193 - main_output_loss: 0.1242 - aux_output_loss: 0.1921 - main_output_accuracy: 0.9599 - aux_output_accuracy: 0.9381 - val_loss: 0.2129 - val_main_output_loss: 0.1171 - val_aux_output_loss: 0.1849 - val_main_output_accuracy: 0.9601 - val_aux_output_accuracy: 0.9379
Epoch 19/100
536/536 [==============================] - 2s 4ms/step - loss: 0.2190 - main_output_loss: 0.1242 - aux_output_loss: 0.1918 - main_output_accuracy: 0.9601 - aux_output_accuracy: 0.9371 - val_loss: 0.2122 - val_main_output_loss: 0.1174 - val_aux_output_loss: 0.1879 - val_main_output_accuracy: 0.9619 - val_aux_output_accuracy: 0.9391
Epoch 20/100
536/536 [==============================] - 2s 4ms/step - loss: 0.2184 - main_output_loss: 0.1240 - aux_output_loss: 0.1922 - main_output_accuracy: 0.9589 - aux_output_accuracy: 0.9354 - val_loss: 0.2155 - val_main_output_loss: 0.1218 - val_aux_output_loss: 0.1881 - val_main_output_accuracy: 0.9603 - val_aux_output_accuracy: 0.9395
Epoch 21/100
536/536 [==============================] - 3s 5ms/step - loss: 0.2175 - main_output_loss: 0.1241 - aux_output_loss: 0.1921 - main_output_accuracy: 0.9605 - aux_output_accuracy: 0.9371 - val_loss: 0.2144 - val_main_output_loss: 0.1213 - val_aux_output_loss: 0.1900 - val_main_output_accuracy: 0.9568 - val_aux_output_accuracy: 0.9381
Epoch 22/100
536/536 [==============================] - 2s 4ms/step - loss: 0.2175 - main_output_loss: 0.1239 - aux_output_loss: 0.1922 - main_output_accuracy: 0.9595 - aux_output_accuracy: 0.9355 - val_loss: 0.2112 - val_main_output_loss: 0.1164 - val_aux_output_loss: 0.1884 - val_main_output_accuracy: 0.9610 - val_aux_output_accuracy: 0.9409
Epoch 23/100
536/536 [==============================] - 2s 4ms/step - loss: 0.2171 - main_output_loss: 0.1241 - aux_output_loss: 0.1924 - main_output_accuracy: 0.9603 - aux_output_accuracy: 0.9365 - val_loss: 0.2134 - val_main_output_loss: 0.1204 - val_aux_output_loss: 0.1902 - val_main_output_accuracy: 0.9615 - val_aux_output_accuracy: 0.9372
Epoch 24/100
536/536 [==============================] - 2s 4ms/step - loss: 0.2161 - main_output_loss: 0.1229 - aux_output_loss: 0.1918 - main_output_accuracy: 0.9605 - aux_output_accuracy: 0.9371 - val_loss: 0.2100 - val_main_output_loss: 0.1170 - val_aux_output_loss: 0.1834 - val_main_output_accuracy: 0.9608 - val_aux_output_accuracy: 0.9391
Epoch 25/100
536/536 [==============================] - 2s 4ms/step - loss: 0.2168 - main_output_loss: 0.1240 - aux_output_loss: 0.1925 - main_output_accuracy: 0.9597 - aux_output_accuracy: 0.9357 - val_loss: 0.2097 - val_main_output_loss: 0.1159 - val_aux_output_loss: 0.1856 - val_main_output_accuracy: 0.9598 - val_aux_output_accuracy: 0.9395
Epoch 26/100
536/536 [==============================] - 2s 5ms/step - loss: 0.2166 - main_output_loss: 0.1239 - aux_output_loss: 0.1926 - main_output_accuracy: 0.9592 - aux_output_accuracy: 0.9369 - val_loss: 0.2100 - val_main_output_loss: 0.1173 - val_aux_output_loss: 0.1839 - val_main_output_accuracy: 0.9587 - val_aux_output_accuracy: 0.9372
Epoch 27/100
536/536 [==============================] - 3s 5ms/step - loss: 0.2166 - main_output_loss: 0.1247 - aux_output_loss: 0.1920 - main_output_accuracy: 0.9585 - aux_output_accuracy: 0.9365 - val_loss: 0.2186 - val_main_output_loss: 0.1260 - val_aux_output_loss: 0.1916 - val_main_output_accuracy: 0.9545 - val_aux_output_accuracy: 0.9314
Epoch 28/100
536/536 [==============================] - 2s 4ms/step - loss: 0.2171 - main_output_loss: 0.1254 - aux_output_loss: 0.1923 - main_output_accuracy: 0.9580 - aux_output_accuracy: 0.9374 - val_loss: 0.2101 - val_main_output_loss: 0.1178 - val_aux_output_loss: 0.1901 - val_main_output_accuracy: 0.9596 - val_aux_output_accuracy: 0.9372
Epoch 29/100
536/536 [==============================] - 2s 4ms/step - loss: 0.2159 - main_output_loss: 0.1236 - aux_output_loss: 0.1929 - main_output_accuracy: 0.9595 - aux_output_accuracy: 0.9356 - val_loss: 0.2083 - val_main_output_loss: 0.1177 - val_aux_output_loss: 0.1870 - val_main_output_accuracy: 0.9603 - val_aux_output_accuracy: 0.9379
Epoch 30/100
536/536 [==============================] - 2s 4ms/step - loss: 0.2158 - main_output_loss: 0.1248 - aux_output_loss: 0.1921 - main_output_accuracy: 0.9594 - aux_output_accuracy: 0.9361 - val_loss: 0.2100 - val_main_output_loss: 0.1185 - val_aux_output_loss: 0.1850 - val_main_output_accuracy: 0.9591 - val_aux_output_accuracy: 0.9388
Epoch 31/100
536/536 [==============================] - 2s 4ms/step - loss: 0.2157 - main_output_loss: 0.1248 - aux_output_loss: 0.1911 - main_output_accuracy: 0.9593 - aux_output_accuracy: 0.9371 - val_loss: 0.2108 - val_main_output_loss: 0.1200 - val_aux_output_loss: 0.1915 - val_main_output_accuracy: 0.9615 - val_aux_output_accuracy: 0.9363
Epoch 32/100
536/536 [==============================] - 2s 4ms/step - loss: 0.2157 - main_output_loss: 0.1246 - aux_output_loss: 0.1936 - main_output_accuracy: 0.9583 - aux_output_accuracy: 0.9356 - val_loss: 0.2095 - val_main_output_loss: 0.1174 - val_aux_output_loss: 0.1825 - val_main_output_accuracy: 0.9619 - val_aux_output_accuracy: 0.9358
Epoch 33/100
536/536 [==============================] - 2s 5ms/step - loss: 0.2147 - main_output_loss: 0.1237 - aux_output_loss: 0.1913 - main_output_accuracy: 0.9597 - aux_output_accuracy: 0.9357 - val_loss: 0.2094 - val_main_output_loss: 0.1209 - val_aux_output_loss: 0.1921 - val_main_output_accuracy: 0.9577 - val_aux_output_accuracy: 0.9402
Epoch 34/100
536/536 [==============================] - 2s 4ms/step - loss: 0.2144 - main_output_loss: 0.1236 - aux_output_loss: 0.1920 - main_output_accuracy: 0.9606 - aux_output_accuracy: 0.9366 - val_loss: 0.2092 - val_main_output_loss: 0.1189 - val_aux_output_loss: 0.1864 - val_main_output_accuracy: 0.9594 - val_aux_output_accuracy: 0.9405
Epoch 35/100
536/536 [==============================] - 2s 4ms/step - loss: 0.2145 - main_output_loss: 0.1245 - aux_output_loss: 0.1922 - main_output_accuracy: 0.9599 - aux_output_accuracy: 0.9360 - val_loss: 0.2076 - val_main_output_loss: 0.1160 - val_aux_output_loss: 0.1831 - val_main_output_accuracy: 0.9608 - val_aux_output_accuracy: 0.9442
Epoch 36/100
536/536 [==============================] - 2s 4ms/step - loss: 0.2149 - main_output_loss: 0.1250 - aux_output_loss: 0.1916 - main_output_accuracy: 0.9594 - aux_output_accuracy: 0.9363 - val_loss: 0.2090 - val_main_output_loss: 0.1175 - val_aux_output_loss: 0.1886 - val_main_output_accuracy: 0.9633 - val_aux_output_accuracy: 0.9407
Epoch 37/100
536/536 [==============================] - 2s 4ms/step - loss: 0.2143 - main_output_loss: 0.1230 - aux_output_loss: 0.1920 - main_output_accuracy: 0.9590 - aux_output_accuracy: 0.9354 - val_loss: 0.2121 - val_main_output_loss: 0.1218 - val_aux_output_loss: 0.1858 - val_main_output_accuracy: 0.9561 - val_aux_output_accuracy: 0.9349
Epoch 38/100
536/536 [==============================] - 3s 5ms/step - loss: 0.2146 - main_output_loss: 0.1245 - aux_output_loss: 0.1914 - main_output_accuracy: 0.9593 - aux_output_accuracy: 0.9368 - val_loss: 0.2106 - val_main_output_loss: 0.1215 - val_aux_output_loss: 0.1902 - val_main_output_accuracy: 0.9566 - val_aux_output_accuracy: 0.9297
Epoch 39/100
536/536 [==============================] - 2s 4ms/step - loss: 0.2144 - main_output_loss: 0.1244 - aux_output_loss: 0.1919 - main_output_accuracy: 0.9601 - aux_output_accuracy: 0.9356 - val_loss: 0.2072 - val_main_output_loss: 0.1158 - val_aux_output_loss: 0.1858 - val_main_output_accuracy: 0.9608 - val_aux_output_accuracy: 0.9409
Epoch 40/100
536/536 [==============================] - 3s 5ms/step - loss: 0.2141 - main_output_loss: 0.1241 - aux_output_loss: 0.1922 - main_output_accuracy: 0.9595 - aux_output_accuracy: 0.9364 - val_loss: 0.2079 - val_main_output_loss: 0.1178 - val_aux_output_loss: 0.1827 - val_main_output_accuracy: 0.9596 - val_aux_output_accuracy: 0.9386
Epoch 41/100
536/536 [==============================] - 2s 4ms/step - loss: 0.2145 - main_output_loss: 0.1249 - aux_output_loss: 0.1922 - main_output_accuracy: 0.9588 - aux_output_accuracy: 0.9356 - val_loss: 0.2104 - val_main_output_loss: 0.1193 - val_aux_output_loss: 0.1872 - val_main_output_accuracy: 0.9636 - val_aux_output_accuracy: 0.9409
Epoch 42/100
536/536 [==============================] - 2s 4ms/step - loss: 0.2138 - main_output_loss: 0.1240 - aux_output_loss: 0.1916 - main_output_accuracy: 0.9601 - aux_output_accuracy: 0.9381 - val_loss: 0.2104 - val_main_output_loss: 0.1216 - val_aux_output_loss: 0.1876 - val_main_output_accuracy: 0.9570 - val_aux_output_accuracy: 0.9314
Epoch 43/100
536/536 [==============================] - 2s 4ms/step - loss: 0.2140 - main_output_loss: 0.1250 - aux_output_loss: 0.1907 - main_output_accuracy: 0.9591 - aux_output_accuracy: 0.9375 - val_loss: 0.2086 - val_main_output_loss: 0.1189 - val_aux_output_loss: 0.1896 - val_main_output_accuracy: 0.9594 - val_aux_output_accuracy: 0.9349
Epoch 44/100
536/536 [==============================] - 2s 5ms/step - loss: 0.2144 - main_output_loss: 0.1249 - aux_output_loss: 0.1925 - main_output_accuracy: 0.9598 - aux_output_accuracy: 0.9360 - val_loss: 0.2074 - val_main_output_loss: 0.1189 - val_aux_output_loss: 0.1906 - val_main_output_accuracy: 0.9582 - val_aux_output_accuracy: 0.9365
Epoch 45/100
536/536 [==============================] - 3s 5ms/step - loss: 0.2143 - main_output_loss: 0.1252 - aux_output_loss: 0.1925 - main_output_accuracy: 0.9593 - aux_output_accuracy: 0.9369 - val_loss: 0.2093 - val_main_output_loss: 0.1200 - val_aux_output_loss: 0.1882 - val_main_output_accuracy: 0.9610 - val_aux_output_accuracy: 0.9360
Epoch 46/100
536/536 [==============================] - 2s 4ms/step - loss: 0.2138 - main_output_loss: 0.1247 - aux_output_loss: 0.1921 - main_output_accuracy: 0.9601 - aux_output_accuracy: 0.9366 - val_loss: 0.2081 - val_main_output_loss: 0.1199 - val_aux_output_loss: 0.1934 - val_main_output_accuracy: 0.9582 - val_aux_output_accuracy: 0.9372
Epoch 47/100
536/536 [==============================] - 3s 6ms/step - loss: 0.2135 - main_output_loss: 0.1242 - aux_output_loss: 0.1928 - main_output_accuracy: 0.9593 - aux_output_accuracy: 0.9361 - val_loss: 0.2081 - val_main_output_loss: 0.1177 - val_aux_output_loss: 0.1850 - val_main_output_accuracy: 0.9601 - val_aux_output_accuracy: 0.9356
Epoch 48/100
536/536 [==============================] - 3s 5ms/step - loss: 0.2140 - main_output_loss: 0.1248 - aux_output_loss: 0.1920 - main_output_accuracy: 0.9588 - aux_output_accuracy: 0.9363 - val_loss: 0.2103 - val_main_output_loss: 0.1199 - val_aux_output_loss: 0.1909 - val_main_output_accuracy: 0.9591 - val_aux_output_accuracy: 0.9311
Epoch 49/100
536/536 [==============================] - 2s 5ms/step - loss: 0.2130 - main_output_loss: 0.1243 - aux_output_loss: 0.1925 - main_output_accuracy: 0.9593 - aux_output_accuracy: 0.9372 - val_loss: 0.2099 - val_main_output_loss: 0.1196 - val_aux_output_loss: 0.1845 - val_main_output_accuracy: 0.9582 - val_aux_output_accuracy: 0.9321
In [148]:
history_functional_API.params
Out[148]:
{'verbose': 1, 'epochs': 100, 'steps': 536}
In [149]:
# Assuming you have history containing accuracy and loss information
acc_train = history_functional_API.history['val_main_output_accuracy']
acc_val =history_functional_API.history['val_main_output_accuracy']
loss_train = history_functional_API.history['loss']
loss_val = history_functional_API.history['val_loss']

# Create a subplot with two plots
fig = go.Figure()

# Plot Training and Validation Accuracy
fig.add_trace(go.Scatter(x=list(range(1, len(acc_train) + 1)), y=acc_train, mode='lines+markers', name='Training Accuracy'))
fig.add_trace(go.Scatter(x=list(range(1, len(acc_val) + 1)), y=acc_val, mode='lines+markers', name='Validation Accuracy'))

# Set layout for the first subplot
fig.update_layout(
    title='Training and Validation Accuracy',
    xaxis=dict(title='Epochs'),
    yaxis=dict(title='Accuracy'),
)

# Add a new subplot for Loss
fig.add_trace(go.Scatter(x=list(range(1, len(loss_train) + 1)), y=loss_train, mode='lines+markers', name='Training Loss'))
fig.add_trace(go.Scatter(x=list(range(1, len(loss_val) + 1)), y=loss_val, mode='lines+markers', name='Validation Loss'))

# Set layout for the second subplot
fig.update_layout(
    title='Training and Validation Loss',
    xaxis=dict(title='Epochs'),
    yaxis=dict(title='Loss'),
)

# Show the plot
fig.show()

Main Confusion Matrix

In [150]:
# Assuming you have predictions for main and auxiliary outputs
y_pred_main, y_pred_aux = model_functional_API.predict((inputA_train, inputB_train, inputC_train))

# Convert predicted probabilities to class labels (assuming binary classification)
y_pred_main_classes = (y_pred_main > 0.5).astype(int)
y_pred_aux_classes = (y_pred_aux > 0.5).astype(int)

# Calculate metrics
main_acc = accuracy_score(y_train_pd, y_pred_main_classes)
aux_acc = accuracy_score(y_train_pd, y_pred_aux_classes)
main_precision = precision_score(y_train_pd, y_pred_main_classes)
aux_precision = precision_score(y_train_pd, y_pred_aux_classes)
main_recall = recall_score(y_train_pd, y_pred_main_classes)
aux_recall = recall_score(y_train_pd, y_pred_aux_classes)
main_f1 = f1_score(y_train_pd, y_pred_main_classes)
aux_f1 = f1_score(y_train_pd, y_pred_aux_classes)
670/670 [==============================] - 2s 3ms/step
In [151]:
cmain=confusion_matrix(y_train_pd,y_pred_main_classes)
# Display the confusion matrix using seaborn
plt.figure(figsize=(8, 6))
sns.heatmap(cmain, annot=True, fmt="d", cmap="Blues", xticklabels=['normal', 'anomaly'], yticklabels=['normal', 'anomaly'])
plt.xlabel("Predicted")
plt.ylabel("True")
plt.title("Confusion Matrix")
plt.show()

Aux Confusion Matrix

In [152]:
cmain=confusion_matrix(y_train_pd,y_pred_aux_classes)
# Display the confusion matrix using seaborn
plt.figure(figsize=(8, 6))
sns.heatmap(cmain, annot=True, fmt="d", cmap="Blues", xticklabels=['normal', 'anomaly'], yticklabels=['normal', 'anomaly'])
plt.xlabel("Predicted")
plt.ylabel("True")
plt.title("Confusion Matrix")
plt.show()
In [153]:
# Bar plot
labels = ['Main Output', 'Auxiliary Output']
accuracy_values = [main_acc, aux_acc]
precision_values = [main_precision, aux_precision]
recall_values = [main_recall, aux_recall]
f1_values = [main_f1, aux_f1]

x = np.arange(len(labels))
width = 0.2

fig, ax = plt.subplots(figsize=(10, 6))
rects1 = ax.bar(x - width, accuracy_values, width, label='Accuracy')
rects2 = ax.bar(x, precision_values, width, label='Precision')
rects3 = ax.bar(x + width, recall_values, width, label='Recall')
rects4 = ax.bar(x + 2 * width, f1_values, width, label='F1 Score')

ax.set_ylabel('Scores')
ax.set_title('Main and Auxiliary Output Metrics')
ax.set_xticks(x + width)
ax.set_xticklabels(labels)
ax.legend()

fig.tight_layout()
plt.show()

Evaluating the Model

In [189]:
# Assuming you have the evaluation results
eval_results = model_functional_API.evaluate((inputA_test, inputB_test, inputC_test), (y_test_pd, y_test_pd, y_test_pd))

# Extracting individual metrics
loss, main_output_accuracy, aux_output_accuracy = eval_results[0], eval_results[3], eval_results[4]

# Bar plot
labels = ['Loss','Main Output Accuracy', 'Aux Accuracy']
values = [loss, main_output_accuracy, aux_output_accuracy]

x = np.arange(len(labels))
width = 0.5

fig, ax = plt.subplots(figsize=(8, 6))
rects = ax.bar(x, values, width, label='Metrics')

ax.set_ylabel('Values')
ax.set_title('Evaluation Metrics')
ax.set_xticks(x)
ax.set_xticklabels(labels)
ax.legend()

# Display the values on top of the bars
for rect in rects:
    height = rect.get_height()
    ax.annotate(f'{height:.4f}', xy=(rect.get_x() + rect.get_width() / 2, height),
                xytext=(0, 3),  # 3 points vertical offset
                textcoords="offset points",
                ha='center', va='bottom')

fig.tight_layout()
plt.show()
119/119 [==============================] - 1s 3ms/step - loss: 0.2092 - main_output_loss: 0.1176 - aux_output_loss: 0.1888 - main_output_accuracy: 0.9624 - aux_output_accuracy: 0.9426
In [156]:
# Assuming you have true labels (y_test_pd) and predicted labels
# Replace predicted_labels with the actual predictions from your model
predicted_labels = model_functional_API.predict((inputA_test, inputB_test, inputC_test))[0]

# Convert predicted probabilities to class labels (assuming binary classification)
predicted_classes = (predicted_labels > 0.5).astype(int)

# Create confusion matrix
cm = confusion_matrix(y_test_pd, predicted_classes)

# Display the confusion matrix using seaborn
plt.figure(figsize=(8, 6))
sns.heatmap(cm, annot=True, fmt="d", cmap="Blues", xticklabels=['normal', 'anomaly'], yticklabels=['normal', 'anomaly'])
plt.xlabel("Predicted")
plt.ylabel("True")
plt.title("Confusion Matrix")
plt.show()
119/119 [==============================] - 0s 3ms/step

Functional API VS Sequential API

This plot represents a comparison between sequential neural netowrk and functional Api

In [157]:
# Define the data
data = {
    'Sequential': [seq_f, seq_r, seq_p, seq_a],
    'Functional': [main_f1, main_recall, main_precision, main_acc]
}

# Create a DataFrame
df = pd.DataFrame(data, index=['F1', 'Recall', 'Precision', 'Accuracy'])

# Transpose the DataFrame
df = df.transpose()

# Create traces for each metric
traces = []
for metric in df.index:
    trace = go.Bar(
        x=df.columns,
        y=df.loc[metric],
        name=metric
    )
    traces.append(trace)

# Create layout
layout = go.Layout(
    title='Evaluation Metrics for Functional API VS Sequential API',
    xaxis=dict(title='Classifier'),
    yaxis=dict(title='Score'),
    barmode='group'
)

# Create figure
fig = go.Figure(data=traces, layout=layout)

# Show the interactive plot
fig.show()

Keras Tuner

In [158]:
tf.keras.backend.clear_session()
In [179]:
# Define the Keras Tuner search space
def build_model(hp):
    # Define hyperparameters to tune
    units_hidden1 = hp.Int('units_hidden1', min_value=16, max_value=64, step=16)
    l2_reg = hp.Float('l2_reg', min_value=1e-5, max_value=1e-3, sampling='LOG')
    units_hidden2 = hp.Int('units_hidden2', min_value=16, max_value=64, step=16)
    units_hidden3 = hp.Int('units_hidden3', min_value=32, max_value=64, step=32)
    units_hidden4 = hp.Int('units_hidden4', min_value=64, max_value=128, step=32)
    l2_reg_output = hp.Float('l2_reg_output', min_value=1e-5, max_value=1e-3, sampling='LOG')
    l2_reg_aux = hp.Float('l2_reg_aux', min_value=1e-5, max_value=1e-3, sampling='LOG')
    learning_rate = hp.Float('learning_rate', min_value=1e-4, max_value=1e-2, sampling='LOG')

    # Use tunable hyperparameters in your model
    hidden1_0 = Dense(units_hidden1, activation=leaky_relu, kernel_initializer="he_normal", kernel_regularizer=tf.keras.regularizers.l2(l2_reg))(input_A)
    hidden1_1 = Dense(15, activation="relu", kernel_initializer="he_normal", kernel_regularizer=tf.keras.regularizers.l2(l2_reg))(input_B)
    concat1 = concatenate([hidden1_0, input_C])
    hidden2_0 = Dense(units_hidden2, activation=leaky_relu, kernel_initializer="he_normal", kernel_regularizer=tf.keras.regularizers.l2(l2_reg))(concat1)
    hidden2_1 = Dense(units_hidden3, activation="relu", kernel_initializer="he_normal", kernel_regularizer=tf.keras.regularizers.l2(l2_reg))(hidden1_1)
    concat2 = concatenate([hidden2_0, hidden2_1])
    output = Dense(1, activation="sigmoid", kernel_initializer=he_avg_init, name='main_output', kernel_regularizer=tf.keras.regularizers.l2(l2_reg_output))(concat2)
    aux_output = Dense(1, activation="sigmoid", kernel_initializer=he_avg_init, name='aux_output', kernel_regularizer=tf.keras.regularizers.l2(l2_reg_aux))(hidden2_0)

    model = tf.keras.Model(inputs=[input_A, input_B, input_C], outputs=[output, aux_output])

    # Compile the model
    optimizer = tf.keras.optimizers.Adam(learning_rate=learning_rate, clipvalue=1.0, ema_momentum=0.95)
    model.compile(optimizer=optimizer, loss=['binary_crossentropy', 'binary_crossentropy'], loss_weights=[0.9, 0.1], metrics=["accuracy"])

    return model

# Create a wrapper function for the tuner
def wrapper_custom_objective(trial):
    return -custom_objective(trial)  # Minimize the negative of the objective

This code defines a Keras Tuner search space for tuning hyperparameters in a functional API model. The build_model function specifies the hyperparameters to be tuned and incorporates them into the model architecture. The wrapper_custom_objective function wraps a custom objective function for optimization. The hyperparameters include the number of hidden units, regularization terms, and learning rate. The model has two outputs with different loss weights, and it is compiled accordingly. The architecture is based on the provided functional API model.

In [180]:
# Define the search space and tuner
tuner = RandomSearch(
    build_model,
    objective=keras_tuner.Objective("val_main_output_accuracy", direction="max"),  # Specify the direction
    max_trials=15,
    directory='tuner_directory',
    project_name='functional_api_tuning'
)
Reloading Tuner from tuner_directory\functional_api_tuning\tuner0.json
In [181]:
my_callbacks = [
    tf.keras.callbacks.EarlyStopping(patience=2, restore_best_weights=True),
    tf.keras.callbacks.ModelCheckpoint("weights.keras",save_best_only=True),
]

# Perform the tuning
tuner.search([inputA_train, inputB_train, inputC_train], (y_train_pd, y_train_pd), epochs=10 ,validation_split=0.2 ,callbacks=my_callbacks)
Trial 15 Complete [00h 00m 29s]
val_main_output_accuracy: 0.9722157120704651

Best val_main_output_accuracy So Far: 0.988792896270752
Total elapsed time: 00h 07m 50s

This code will perform random search hyperparameter tuning using the Keras Tuner library and save the best hyperparameters in the specified directory. The process will run for a maximum of 21 trials.

In [182]:
best_model = tuner.get_best_models()[0]
In [183]:
best_model.summary()
Model: "model"
__________________________________________________________________________________________________
 Layer (type)                Output Shape                 Param #   Connected to                  
==================================================================================================
 input_A (InputLayer)        [(None, 25)]                 0         []                            
                                                                                                  
 dense (Dense)               (None, 64)                   1664      ['input_A[0][0]']             
                                                                                                  
 input_C (InputLayer)        [(None, 10)]                 0         []                            
                                                                                                  
 input_B (InputLayer)        [(None, 15)]                 0         []                            
                                                                                                  
 concatenate (Concatenate)   (None, 74)                   0         ['dense[0][0]',               
                                                                     'input_C[0][0]']             
                                                                                                  
 dense_1 (Dense)             (None, 15)                   240       ['input_B[0][0]']             
                                                                                                  
 dense_2 (Dense)             (None, 16)                   1200      ['concatenate[0][0]']         
                                                                                                  
 dense_3 (Dense)             (None, 64)                   1024      ['dense_1[0][0]']             
                                                                                                  
 concatenate_1 (Concatenate  (None, 80)                   0         ['dense_2[0][0]',             
 )                                                                   'dense_3[0][0]']             
                                                                                                  
 main_output (Dense)         (None, 1)                    81        ['concatenate_1[0][0]']       
                                                                                                  
 aux_output (Dense)          (None, 1)                    17        ['dense_2[0][0]']             
                                                                                                  
==================================================================================================
Total params: 4226 (16.51 KB)
Trainable params: 4226 (16.51 KB)
Non-trainable params: 0 (0.00 Byte)
__________________________________________________________________________________________________

Main Confusion Matrix

In [184]:
# Assuming you have predictions for main and auxiliary outputs
y_pred_main_best_model, y_pred_aux_best_model = best_model.predict((inputA_train, inputB_train, inputC_train))

# Convert predicted probabilities to class labels (assuming binary classification)
y_pred_main_classes_best_model = (y_pred_main_best_model > 0.5).astype(int)
y_pred_aux_classes_best_model = (y_pred_aux_best_model > 0.5).astype(int)

# Calculate metrics
main_acc_best_model = accuracy_score(y_train_pd, y_pred_main_classes_best_model)
aux_acc_best_model = accuracy_score(y_train_pd, y_pred_aux_classes_best_model)
main_precision_best_model = precision_score(y_train_pd, y_pred_main_classes_best_model)
aux_precision_best_model = precision_score(y_train_pd, y_pred_aux_classes_best_model)
main_recall_best_model = recall_score(y_train_pd, y_pred_main_classes_best_model)
aux_recall_best_model = recall_score(y_train_pd, y_pred_aux_classes_best_model)
main_f1_best_model = f1_score(y_train_pd, y_pred_main_classes_best_model)
aux_f1_best_model = f1_score(y_train_pd, y_pred_aux_classes_best_model)
670/670 [==============================] - 2s 3ms/step
In [185]:
cmain=confusion_matrix(y_train_pd,y_pred_main_classes_best_model)
# Display the confusion matrix using seaborn
plt.figure(figsize=(8, 6))
sns.heatmap(cmain, annot=True, fmt="d", cmap="Blues", xticklabels=['normal', 'anomaly'], yticklabels=['normal', 'anomaly'])
plt.xlabel("Predicted")
plt.ylabel("True")
plt.title("Confusion Matrix")
plt.show()

Aux Confusion Matrix

In [186]:
cmain=confusion_matrix(y_train_pd,y_pred_aux_classes_best_model)
# Display the confusion matrix using seaborn
plt.figure(figsize=(8, 6))
sns.heatmap(cmain, annot=True, fmt="d", cmap="Blues", xticklabels=['normal', 'anomaly'], yticklabels=['normal', 'anomaly'])
plt.xlabel("Predicted")
plt.ylabel("True")
plt.title("Confusion Matrix")
plt.show()
In [187]:
# Bar plot
labels = ['Main Output', 'Auxiliary Output']
accuracy_values_best_model = [main_acc_best_model, aux_acc_best_model]
precision_values_best_model = [main_precision_best_model, aux_precision_best_model]
recall_values_best_model = [main_recall_best_model, aux_recall_best_model]
f1_values_best_model = [main_f1_best_model, aux_f1_best_model]

x = np.arange(len(labels))
width = 0.2

fig, ax = plt.subplots(figsize=(10, 6))
rects1 = ax.bar(x - width, accuracy_values_best_model, width, label='Accuracy')
rects2 = ax.bar(x, precision_values_best_model, width, label='Precision')
rects3 = ax.bar(x + width, recall_values_best_model, width, label='Recall')
rects4 = ax.bar(x + 2 * width, f1_values_best_model, width, label='F1 Score')

ax.set_ylabel('Scores')
ax.set_title('Main and Auxiliary Output Metrics')
ax.set_xticks(x + width)
ax.set_xticklabels(labels)
ax.legend()

fig.tight_layout()
plt.show()

Evaluating the Model

In [195]:
# Assuming you have true labels (y_test_pd) and predicted labels
# Replace predicted_labels with the actual predictions from your model
predicted_labels_best_model = best_model.predict((inputA_test, inputB_test, inputC_test))[0]

# Convert predicted probabilities to class labels (assuming binary classification)
predicted_classes_best_model = (predicted_labels_best_model > 0.5).astype(int)

# Create confusion matrix
cm = confusion_matrix(y_test_pd, predicted_classes_best_model)

# Display the confusion matrix using seaborn
plt.figure(figsize=(8, 6))
sns.heatmap(cm, annot=True, fmt="d", cmap="Blues", xticklabels=['normal', 'anomaly'], yticklabels=['normal', 'anomaly'])
plt.xlabel("Predicted")
plt.ylabel("True")
plt.title("Confusion Matrix")
plt.show()
119/119 [==============================] - 0s 3ms/step

Functional API VS Functional_best_model

This plot represents a comparison between Functional_best_model neural netowrk and functional Api

In [193]:
# Define the data
data_best = {
    'Functional': [main_f1, main_recall, main_precision, main_acc],
    'Functional_best_model': [main_f1_best_model, main_recall_best_model, main_precision_best_model, main_acc_best_model]
}

# Create a DataFrame
df_best = pd.DataFrame(data_best, index=['F1', 'Recall', 'Precision', 'Accuracy'])

# Transpose the DataFrame
df_best = df_best.transpose()

# Create traces for each metric
traces = []
for metric in df_best.index:
    trace = go.Bar(
        x=df_best.columns,
        y=df_best.loc[metric],
        name=metric
    )
    traces.append(trace)

# Create layout
layout = go.Layout(
    title='Evaluation Metrics for Functional API VS Functional Best Model',
    xaxis=dict(title='Classifier'),
    yaxis=dict(title='Score'),
    barmode='group'
)

# Create figure
fig = go.Figure(data=traces, layout=layout)

# Show the interactive plot
fig.show()

You are welcome